Hello,

currently I'm trying to set up SLURM on a gpu cluster with a small number of
nodes (where smurf0[1-7] are the node names) using the gpu plugin to
allocate jobs (requiring gpus).

Unfortunately, when trying to run a gpu-job (any number of gpus;
--gres=gpu:N), SLURM doesn't execute it, asserting unavailability of the
requested configuration.
I attached some logs and configuration text files in order to provide any
information necessary to analyze this issue.

Note: Cross posted here: http://serverfault.com/questions/685258

Example (using some test.sh which is echoing $CUDA_VISIBLE_DEVICES):
 
    srun -n1 --gres=gpu:1 test.sh
        --> srun: error: Unable to allocate resources: Requested node
configuration is not available

The slurmctld log for such calls shows:

    gres: gpu state for job X
        gres_cnt:1 node_cnt:1 type:(null)
        _pick_best_nodes: job X never runnable
        _slurm_rpc_allocate_resources: Requested node configuration is not
available

Jobs with any other type of configured generic resource complete
successfully:

    srun -n1 --gres=gram:500 test.sh 
        --> CUDA_VISIBLE_DEVICES=NoDevFiles

The nodes and gres configuration in slurm.conf (which is attached as well)
are like:

    GresTypes=gpu,ram,gram,scratch
    ...
    NodeName=smurf01 NodeAddr=192.168.1.101 Feature="intel,fermi" Boards=1
SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2
Gres=gpu:tesla:8,ram:48,gram:no_consume:6000,scratch:1300
    NodeName=smurf02 NodeAddr=192.168.1.102 Feature="intel,fermi" Boards=1
SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=1
Gres=gpu:tesla:8,ram:48,gram:no_consume:6000,scratch:1300

The respective gres.conf files are
    Name=gpu Count=8 Type=tesla File=/dev/nvidia[0-7]
    Name=ram Count=48
    Name=gram Count=6000
    Name=scratch Count=1300

The output of "scontrol show node" lists all the nodes with the correct gres
configuration i.e.:

    NodeName=smurf01 Arch=x86_64 CoresPerSocket=6
       CPUAlloc=0 CPUErr=0 CPUTot=24 CPULoad=0.01 Features=intel,fermi
       Gres=gpu:tesla:8,ram:48,gram:no_consume:6000,scratch:1300
       ...etc.

As far as I can tell, the slurmd daemon on the nodes recognizes the gpus
(and other generic resources) correctly.

My slurmd.log on node smurf01 says

    Gres Name = gpu Type = tesla Count = 8 ID = 7696487 File = /dev
/nvidia[0 - 7]

The log for slurmctld shows

    gres / gpu: state for smurf01
       gres_cnt found : 8 configured : 8 avail : 8 alloc : 0
       gres_bit_alloc :
       gres_used : (null)

I can't figure out why the controller node states that jobs using
--gres=gpu:N are "never runnable" and why "the requested node configuration
is not available".
Any help is appreciated.

Kind regards,
Daniel Weber

PS: If further information is required, don't hesitate to ask.
# GENERAL

ClusterName=egpc
ControlMachine=wtch020
ControlAddr=192.168.1.1
AuthType=auth/munge
CryptoType=crypto/munge
CacheGroups=0
DisableRootJobs=YES
MpiDefault=none
Proctracktype=proctrack/cgroup

# DAEMONS
StateSaveLocation=/var/spool/slurmctld
SlurmctldPidFile=/var/run/slurm/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
SwitchType=switch/none

# SCRIPTS
Prolog=/apps/.slurm/job_prolog.sh
Epilog=/apps/.slurm/job_epilog.sh
TaskProlog=/apps/.slurm/task_prolog.sh
TaskEpilog=/apps/.slurm/task_epilog.sh

# TIMERS AND LIMITS
MaxJobCount=5000 
ReturnToService=1
TaskPlugin=task/cgroup
InactiveLimit=0
KillWait=30
MinJobAge=120
OverTimeLimit=60
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0

# SCHEDULING 
#DefMemPerCPU=0 
#MaxMemPerCPU=0 
FastSchedule=1
#SchedulerRootFilter=1 
#SchedulerTimeSlice=30 
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/linear
#SelectTypeParameters=


# JOB PRIORITY 
#PriorityFlags= 
PriorityType=priority/multifactor
PriorityDecayHalfLife=7-0
PriorityFavorSmall=YES
#PriorityCalcPeriod= 
#PriorityUsageResetPeriod= 
PriorityMaxAge=3-0
PriorityWeightAge=1
#PriorityWeightFairshare=0
PriorityWeightJobSize=1
#PriorityWeightQOS= 
#PriorityWeightPartition= 

# ACCOUNTING 
#AccountingStorageEnforce=0 

AccountingStorageType=accounting_storage/slurmdbd
AccountingStoragePort=6819
AccountingStorageUser=slurm
AccountingStoreJobComment=YES
JobAcctGatherFrequency=60
JobAcctGatherType=jobacct_gather/linux

# LOGGING
SlurmctldDebug=5
SlurmdDebug=5
DebugFlags=Gres
SlurmctldLogFile=/var/log/slurm/ctld
SlurmdLogFile=/var/log/slurmd
#SlurmSchedLogFile= 
#SlurmSchedLogLevel= 

# COMPUTE NODES 

GresTypes=gpu,ram,gram,scratch

NodeName=smurf01 NodeAddr=192.168.1.101 Feature="intel,fermi" Boards=1 
SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 
Gres=gpu:tesla:8,ram:48,gram:no_consume:6000,scratch:1300
NodeName=smurf02 NodeAddr=192.168.1.102 Feature="intel,fermi" Boards=1 
SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=1 
Gres=gpu:tesla:8,ram:48,gram:no_consume:6000,scratch:1300
NodeName=smurf03 NodeAddr=192.168.1.103 Feature="intel,fermi" Boards=1 
SocketsPerBoard=2 CoresPerSocket=4 ThreadsPerCore=1 
Gres=gpu:gtx:3,ram:94,gram:no_consume:1500,scratch:280
NodeName=smurf04 NodeAddr=192.168.1.104 Feature="intel,fermi" Boards=1 
SocketsPerBoard=2 CoresPerSocket=4 ThreadsPerCore=1 
Gres=gpu:gtx:4,ram:94,gram:no_consume:1500,scratch:280
NodeName=smurf05 NodeAddr=192.168.1.105 Feature="intel,kepler" Boards=1 
SocketsPerBoard=2 CoresperSocket=8 ThreadsPerCore=2 
Gres=gpu:gtx:4,ram:256,gram:no_consume:6000,scratch:2400
NodeName=smurf06 NodeAddr=192.168.1.106 Feature="intel,fermi" Boards=1 
SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=1 
Gres=gpu:gtx:2,ram:8,gram:no_consume:1250,scratch:1800
NodeName=smurf07 NodeAddr=192.168.1.107 Feature="amd,fermi" Boards=1 
SocketsPerBoard=1 CoresPerSocket=6 ThreadsPerCore=1 
Gres=gpu:gtx:2,ram:16,gram:no_consume:1250,scratch:54

# PARTITIONS

PartitionName=work Nodes=smurf0[1-7] Default=YES MaxTime=INFINITE State=UP
#PartitionName=supermicro Nodes=smurf03,smurf04,smurf05 MaxTime=INFINITE 
State=UP
#PartitionName=tower Nodes=smurf06,smurf07 MaxTime=INFINITE State=UP

[2015-05-06T14:58:13.476] slurmctld version 14.11.5 started on cluster egpc
[2015-05-06T14:58:13.477] Munge cryptographic signature plugin loaded
[2015-05-06T14:58:13.477] debug:  init: Gres GPU plugin loaded
[2015-05-06T14:58:13.477] debug:  gres: Couldn't find the specified plugin name 
for gres/ram looking at all files
[2015-05-06T14:58:13.477] debug:  Cannot find plugin of type gres/ram, just 
track gres counts
[2015-05-06T14:58:13.477] debug:  gres: Couldn't find the specified plugin name 
for gres/gram looking at all files
[2015-05-06T14:58:13.477] debug:  Cannot find plugin of type gres/gram, just 
track gres counts
[2015-05-06T14:58:13.478] debug:  gres: Couldn't find the specified plugin name 
for gres/scratch looking at all files
[2015-05-06T14:58:13.478] debug:  Cannot find plugin of type gres/scratch, just 
track gres counts
[2015-05-06T14:58:13.478] preempt/none loaded
[2015-05-06T14:58:13.478] debug:  Checkpoint plugin loaded: checkpoint/none
[2015-05-06T14:58:13.478] debug:  AcctGatherEnergy NONE plugin loaded
[2015-05-06T14:58:13.478] debug:  AcctGatherProfile NONE plugin loaded
[2015-05-06T14:58:13.479] debug:  AcctGatherInfiniband NONE plugin loaded
[2015-05-06T14:58:13.479] debug:  AcctGatherFilesystem NONE plugin loaded
[2015-05-06T14:58:13.479] debug:  Job accounting gather LINUX plugin loaded
[2015-05-06T14:58:13.479] ExtSensors NONE plugin loaded
[2015-05-06T14:58:13.479] debug:  switch NONE plugin loaded
[2015-05-06T14:58:13.479] debug:  No backup controller to shutdown
[2015-05-06T14:58:13.479] Accounting storage SLURMDBD plugin loaded with 
AuthInfo=(null)
[2015-05-06T14:58:13.480] debug:  auth plugin for Munge 
(http://code.google.com/p/munge/) loaded
[2015-05-06T14:58:13.481] debug:  slurmdbd: Sent DbdInit msg
[2015-05-06T14:58:13.481] slurmdbd: recovered 0 pending RPCs
[2015-05-06T14:58:13.805] debug:  Reading slurm.conf file: /etc/slurm/slurm.conf
[2015-05-06T14:58:13.807] layouts: no layout to initialize
[2015-05-06T14:58:13.807] topology NONE plugin loaded
[2015-05-06T14:58:13.807] debug:  No DownNodes
[2015-05-06T14:58:13.807] sched: Backfill scheduler plugin loaded
[2015-05-06T14:58:13.807] route default plugin loaded
[2015-05-06T14:58:13.808] layouts: loading entities/relations information
[2015-05-06T14:58:13.808] debug:  layouts: 7/7 nodes in hash table, rc=0
[2015-05-06T14:58:13.808] debug:  layouts: loading stage 1
[2015-05-06T14:58:13.808] debug:  layouts: loading stage 2
[2015-05-06T14:58:13.808] Recovered state of 7 nodes
[2015-05-06T14:58:13.808] gres: gpu state for job 120
[2015-05-06T14:58:13.808]   gres_cnt:1 node_cnt:0 type:(null)
[2015-05-06T14:58:13.808] Recovered JobID=120 State=0x5 NodeCnt=0 Assoc=0
[2015-05-06T14:58:13.808] Recovered information about 1 jobs
[2015-05-06T14:58:13.808] init_requeue_policy: kill_invalid_depend is set to 0
[2015-05-06T14:58:13.808] gres/gpu: state for smurf01
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:8 avail:8 alloc:0
[2015-05-06T14:58:13.808]   gres_bit_alloc:
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808]   type_cnt_alloc[0]:0
[2015-05-06T14:58:13.808]   type_cnt_avail[0]:8
[2015-05-06T14:58:13.808]   type[0]:tesla
[2015-05-06T14:58:13.808] gres/ram: state for smurf01
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:48 avail:48 alloc:0
[2015-05-06T14:58:13.808]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808] gres/gram: state for smurf01
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:6000 avail:6000 
no_consume
[2015-05-06T14:58:13.808]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808] gres/scratch: state for smurf01
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:1300 avail:1300 
alloc:0
[2015-05-06T14:58:13.808]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808] gres/gpu: state for smurf02
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:8 avail:8 alloc:0
[2015-05-06T14:58:13.808]   gres_bit_alloc:
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808]   type_cnt_alloc[0]:0
[2015-05-06T14:58:13.808]   type_cnt_avail[0]:8
[2015-05-06T14:58:13.808]   type[0]:tesla
[2015-05-06T14:58:13.808] gres/ram: state for smurf02
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:48 avail:48 alloc:0
[2015-05-06T14:58:13.808]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808] gres/gram: state for smurf02
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:6000 avail:6000 
no_consume
[2015-05-06T14:58:13.808]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808] gres/scratch: state for smurf02
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:1300 avail:1300 
alloc:0
[2015-05-06T14:58:13.808]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808] gres/gpu: state for smurf03
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:3 avail:3 alloc:0
[2015-05-06T14:58:13.808]   gres_bit_alloc:
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808]   type_cnt_alloc[0]:0
[2015-05-06T14:58:13.808]   type_cnt_avail[0]:3
[2015-05-06T14:58:13.808]   type[0]:gtx
[2015-05-06T14:58:13.808] gres/ram: state for smurf03
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:94 avail:94 alloc:0
[2015-05-06T14:58:13.808]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808] gres/gram: state for smurf03
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:1500 avail:1500 
no_consume
[2015-05-06T14:58:13.808]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808] gres/scratch: state for smurf03
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:280 avail:280 alloc:0
[2015-05-06T14:58:13.808]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.808]   gres_used:(null)
[2015-05-06T14:58:13.808] gres/gpu: state for smurf04
[2015-05-06T14:58:13.808]   gres_cnt found:TBD configured:4 avail:4 alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809]   type_cnt_alloc[0]:0
[2015-05-06T14:58:13.809]   type_cnt_avail[0]:4
[2015-05-06T14:58:13.809]   type[0]:gtx
[2015-05-06T14:58:13.809] gres/ram: state for smurf04
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:94 avail:94 alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/gram: state for smurf04
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:1500 avail:1500 
no_consume
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/scratch: state for smurf04
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:280 avail:280 alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/gpu: state for smurf05
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:4 avail:4 alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809]   type_cnt_alloc[0]:0
[2015-05-06T14:58:13.809]   type_cnt_avail[0]:4
[2015-05-06T14:58:13.809]   type[0]:gtx
[2015-05-06T14:58:13.809] gres/ram: state for smurf05
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:256 avail:256 alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/gram: state for smurf05
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:6000 avail:6000 
no_consume
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/scratch: state for smurf05
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:2400 avail:2400 
alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/gpu: state for smurf06
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:2 avail:2 alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809]   type_cnt_alloc[0]:0
[2015-05-06T14:58:13.809]   type_cnt_avail[0]:2
[2015-05-06T14:58:13.809]   type[0]:gtx
[2015-05-06T14:58:13.809] gres/ram: state for smurf06
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:8 avail:8 alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/gram: state for smurf06
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:1250 avail:1250 
no_consume
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/scratch: state for smurf06
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:1800 avail:1800 
alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/gpu: state for smurf07
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:2 avail:2 alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809]   type_cnt_alloc[0]:0
[2015-05-06T14:58:13.809]   type_cnt_avail[0]:2
[2015-05-06T14:58:13.809]   type[0]:gtx
[2015-05-06T14:58:13.809] gres/ram: state for smurf07
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:16 avail:16 alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/gram: state for smurf07
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:1250 avail:1250 
no_consume
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] gres/scratch: state for smurf07
[2015-05-06T14:58:13.809]   gres_cnt found:TBD configured:54 avail:54 alloc:0
[2015-05-06T14:58:13.809]   gres_bit_alloc:NULL
[2015-05-06T14:58:13.809]   gres_used:(null)
[2015-05-06T14:58:13.809] de[2015-05-06T14:58:13.809] Recovered state of 0 
reservations
[2015-05-06T14:58:13.809] State of 0 triggers recovered
[2015-05-06T14:58:13.809] read_slurm_conf: backup_controller not specified.
[2015-05-06T14:58:13.809] Running as primary controller
[2015-05-06T14:58:13.809] Registering slurmctld at port 6817 with slurmdbd.
[2015-05-06T14:58:13.982] debug:  Priority MULTIFACTOR plugin loaded
[2015-05-06T14:58:13.983] debug:  power_save module disabled, SuspendTime < 0
[2015-05-06T14:58:17.005] debug:  Spawning registration agent for smurf[01-07] 
7 hosts
[2015-05-06T14:58:17.009] gres/gpu: state for smurf06
[2015-05-06T14:58:17.009]   gres_cnt found:2 configured:2 avail:2 alloc:0
[2015-05-06T14:58:17.009]   gres_bit_alloc:
[2015-05-06T14:58:17.009]   gres_used:(null)
[2015-05-06T14:58:17.009]   topo_cpus_bitmap[0]:NULL
[2015-05-06T14:58:17.009]   topo_gres_bitmap[0]:0-1
[2015-05-06T14:58:17.009]   topo_gres_cnt_alloc[0]:0
[2015-05-06T14:58:17.009]   topo_gres_cnt_avail[0]:2
[2015-05-06T14:58:17.009]   type[0]:gtx
[2015-05-06T14:58:17.009]   type_cnt_alloc[0]:0
[2015-05-06T14:58:17.009]   type_cnt_avail[0]:2
[2015-05-06T14:58:17.009]   type[0]:gtx
[2015-05-06T14:58:17.009] gres/ram: state for smurf06
[2015-05-06T14:58:17.009]   gres_cnt found:8 configured:8 avail:8 alloc:0
[2015-05-06T14:58:17.009]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.009]   gres_used:(null)
[2015-05-06T14:58:17.009] gres/gram: state for smurf06
[2015-05-06T14:58:17.009]   gres_cnt found:1250 configured:1250 avail:1250 
no_consume
[2015-05-06T14:58:17.009]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.009]   gres_used:(null)
[2015-05-06T14:58:17.009] gres/scratch: state for smurf06
[2015-05-06T14:58:17.009]   gres_cnt found:1800 configured:1800 avail:1800 
alloc:0
[2015-05-06T14:58:17.009]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.009]   gres_used:(null)
[2015-05-06T14:58:17.009] debug:  validate_node_specs: node smurf06 registered 
with 0 jobs
[2015-05-06T14:58:17.009] gres/gpu: state for smurf04
[2015-05-06T14:58:17.009]   gres_cnt found:4 configured:4 avail:4 alloc:0
[2015-05-06T14:58:17.009]   gres_bit_alloc:
[2015-05-06T14:58:17.009]   gres_used:(null)
[2015-05-06T14:58:17.009]   topo_cpus_bitmap[0]:NULL
[2015-05-06T14:58:17.009]   topo_gres_bitmap[0]:0-3
[2015-05-06T14:58:17.009]   topo_gres_cnt_alloc[0]:0
[2015-05-06T14:58:17.009]   topo_gres_cnt_avail[0]:4
[2015-05-06T14:58:17.009]   type[0]:gtx
[2015-05-06T14:58:17.009]   type_cnt_alloc[0]:0
[2015-05-06T14:58:17.009]   type_cnt_avail[0]:4
[2015-05-06T14:58:17.009]   type[0]:gtx
[2015-05-06T14:58:17.009] gres/ram: state for smurf04
[2015-05-06T14:58:17.009]   gres_cnt found:94 configured:94 avail:94 alloc:0
[2015-05-06T14:58:17.009]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.009]   gres_used:(null)
[2015-05-06T14:58:17.009] gres/gram: state for smurf04
[2015-05-06T14:58:17.009]   gres_cnt found:1500 configured:1500 avail:1500 
no_consume
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] gres/scratch: state for smurf04
[2015-05-06T14:58:17.010]   gres_cnt found:280 configured:280 avail:280 alloc:0
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] debug:  validate_node_specs: node smurf04 registered 
with 0 jobs
[2015-05-06T14:58:17.010] gres/gpu: state for smurf07
[2015-05-06T14:58:17.010]   gres_cnt found:2 configured:2 avail:2 alloc:0
[2015-05-06T14:58:17.010]   gres_bit_alloc:
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010]   topo_cpus_bitmap[0]:NULL
[2015-05-06T14:58:17.010]   topo_gres_bitmap[0]:0-1
[2015-05-06T14:58:17.010]   topo_gres_cnt_alloc[0]:0
[2015-05-06T14:58:17.010]   topo_gres_cnt_avail[0]:2
[2015-05-06T14:58:17.010]   type[0]:gtx
[2015-05-06T14:58:17.010]   type_cnt_alloc[0]:0
[2015-05-06T14:58:17.010]   type_cnt_avail[0]:2
[2015-05-06T14:58:17.010]   type[0]:gtx
[2015-05-06T14:58:17.010] gres/ram: state for smurf07
[2015-05-06T14:58:17.010]   gres_cnt found:16 configured:16 avail:16 alloc:0
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] gres/gram: state for smurf07
[2015-05-06T14:58:17.010]   gres_cnt found:1250 configured:1250 avail:1250 
no_consume
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] gres/scratch: state for smurf07
[2015-05-06T14:58:17.010]   gres_cnt found:54 configured:54 avail:54 alloc:0
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] debug:  validate_node_specs: node smurf07 registered 
with 0 jobs
[2015-05-06T14:58:17.010] gres/gpu: state for smurf05
[2015-05-06T14:58:17.010]   gres_cnt found:4 configured:4 avail:4 alloc:0
[2015-05-06T14:58:17.010]   gres_bit_alloc:
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010]   topo_cpus_bitmap[0]:NULL
[2015-05-06T14:58:17.010]   topo_gres_bitmap[0]:0-3
[2015-05-06T14:58:17.010]   topo_gres_cnt_alloc[0]:0
[2015-05-06T14:58:17.010]   topo_gres_cnt_avail[0]:4
[2015-05-06T14:58:17.010]   type[0]:gtx
[2015-05-06T14:58:17.010]   type_cnt_alloc[0]:0
[2015-05-06T14:58:17.010]   type_cnt_avail[0]:4
[2015-05-06T14:58:17.010]   type[0]:gtx
[2015-05-06T14:58:17.010] gres/ram: state for smurf05
[2015-05-06T14:58:17.010]   gres_cnt found:256 configured:256 avail:256 alloc:0
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] gres/gram: state for smurf05
[2015-05-06T14:58:17.010]   gres_cnt found:6000 configured:6000 avail:6000 
no_consume
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] gres/scratch: state for smurf05
[2015-05-06T14:58:17.010]   gres_cnt found:2400 configured:2400 avail:2400 
alloc:0
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] debug:  validate_node_specs: node smurf05 registered 
with 0 jobs
[2015-05-06T14:58:17.010] gres/gpu: state for smurf03
[2015-05-06T14:58:17.010]   gres_cnt found:3 configured:3 avail:3 alloc:0
[2015-05-06T14:58:17.010]   gres_bit_alloc:
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010]   topo_cpus_bitmap[0]:NULL
[2015-05-06T14:58:17.010]   topo_gres_bitmap[0]:0-2
[2015-05-06T14:58:17.010]   topo_gres_cnt_alloc[0]:0
[2015-05-06T14:58:17.010]   topo_gres_cnt_avail[0]:3
[2015-05-06T14:58:17.010]   type[0]:gtx
[2015-05-06T14:58:17.010]   type_cnt_alloc[0]:0
[2015-05-06T14:58:17.010]   type_cnt_avail[0]:3
[2015-05-06T14:58:17.010]   type[0]:gtx
[2015-05-06T14:58:17.010] gres/ram: state for smurf03
[2015-05-06T14:58:17.010]   gres_cnt found:94 configured:94 avail:94 alloc:0
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] gres/gram: state for smurf03
[2015-05-06T14:58:17.010]   gres_cnt found:1500 configured:1500 avail:1500 
no_consume
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] gres/scratch: state for smurf03
[2015-05-06T14:58:17.010]   gres_cnt found:280 configured:280 avail:280 alloc:0
[2015-05-06T14:58:17.010]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.010]   gres_used:(null)
[2015-05-06T14:58:17.010] debug:  validate_node_specs: node smurf03 registered 
with 0 jobs
[2015-05-06T14:58:17.011] gres/gpu: state for smurf01
[2015-05-06T14:58:17.011]   gres_cnt found:8 configured:8 avail:8 alloc:0
[2015-05-06T14:58:17.011]   gres_bit_alloc:
[2015-05-06T14:58:17.011]   gres_used:(null)
[2015-05-06T14:58:17.011]   topo_cpus_bitmap[0]:NULL
[2015-05-06T14:58:17.011]   topo_gres_bitmap[0]:0
[2015-05-06T14:58:17.011]   topo_gres_cnt_alloc[0]:0
[2015-05-06T14:58:17.011]   topo_gres_cnt_avail[0]:1
[2015-05-06T14:58:17.011]   type[0]:tesla
[2015-05-06T14:58:17.011]   topo_cpus_bitmap[1]:NULL
[2015-05-06T14:58:17.011]   topo_gres_bitmap[1]:1
[2015-05-06T14:58:17.011]   topo_gres_cnt_alloc[1]:0
[2015-05-06T14:58:17.011]   topo_gres_cnt_avail[1]:1
[2015-05-06T14:58:17.011]   type[1]:tesla
[2015-05-06T14:58:17.011]   topo_cpus_bitmap[2]:NULL
[2015-05-06T14:58:17.011]   topo_gres_bitmap[2]:2
[2015-05-06T14:58:17.011]   topo_gres_cnt_alloc[2]:0
[2015-05-06T14:58:17.011]   topo_gres_cnt_avail[2]:1
[2015-05-06T14:58:17.011]   type[2]:tesla
[2015-05-06T14:58:17.011]   topo_cpus_bitmap[3]:NULL
[2015-05-06T14:58:17.011]   topo_gres_bitmap[3]:3
[2015-05-06T14:58:17.011]   topo_gres_cnt_alloc[3]:0
[2015-05-06T14:58:17.011]   topo_gres_cnt_avail[3]:1
[2015-05-06T14:58:17.011]   type[3]:tesla
[2015-05-06T14:58:17.011]   topo_cpus_bitmap[4]:NULL
[2015-05-06T14:58:17.011]   topo_gres_bitmap[4]:4
[2015-05-06T14:58:17.011]   topo_gres_cnt_alloc[4]:0
[2015-05-06T14:58:17.011]   topo_gres_cnt_avail[4]:1
[2015-05-06T14:58:17.011]   type[4]:tesla
[2015-05-06T14:58:17.011]   topo_cpus_bitmap[5]:NULL
[2015-05-06T14:58:17.011]   topo_gres_bitmap[5]:5
[2015-05-06T14:58:17.011]   topo_gres_cnt_alloc[5]:0
[2015-05-06T14:58:17.011]   topo_gres_cnt_avail[5]:1
[2015-05-06T14:58:17.011]   type[5]:tesla
[2015-05-06T14:58:17.011]   topo_cpus_bitmap[6]:NULL
[2015-05-06T14:58:17.011]   topo_gres_bitmap[6]:6
[2015-05-06T14:58:17.011]   topo_gres_cnt_alloc[6]:0
[2015-05-06T14:58:17.011]   topo_gres_cnt_avail[6]:1
[2015-05-06T14:58:17.011]   type[6]:tesla
[2015-05-06T14:58:17.011]   topo_cpus_bitmap[7]:NULL
[2015-05-06T14:58:17.011]   topo_gres_bitmap[7]:7
[2015-05-06T14:58:17.011]   topo_gres_cnt_alloc[7]:0
[2015-05-06T14:58:17.011]   topo_gres_cnt_avail[7]:1
[2015-05-06T14:58:17.011]   type[7]:tesla
[2015-05-06T14:58:17.011]   type_cnt_alloc[0]:0
[2015-05-06T14:58:17.011]   type_cnt_avail[0]:8
[2015-05-06T14:58:17.011]   type[0]:tesla
[2015-05-06T14:58:17.011] gres/ram: state for smurf01
[2015-05-06T14:58:17.011]   gres_cnt found:48 configured:48 avail:48 alloc:0
[2015-05-06T14:58:17.011]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.011]   gres_used:(null)
[2015-05-06T14:58:17.011] gres/gram: state for smurf01
[2015-05-06T14:58:17.011]   gres_cnt found:6000 configured:6000 avail:6000 
no_consume
[2015-05-06T14:58:17.011]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.011]   gres_used:(null)
[2015-05-06T14:58:17.011] gres/scratch: state for smurf01
[2015-05-06T14:58:17.011]   gres_cnt found:1300 configured:1300 avail:1300 
alloc:0
[2015-05-06T14:58:17.011]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.011]   gres_used:(null)
[2015-05-06T14:58:17.011] debug:  validate_node_specs: node smurf01 registered 
with 0 jobs
[2015-05-06T14:58:17.011] gres/gpu: state for smurf02
[2015-05-06T14:58:17.011]   gres_cnt found:8 configured:8 avail:8 alloc:0
[2015-05-06T14:58:17.011]   gres_bit_alloc:
[2015-05-06T14:58:17.011]   gres_used:(null)
[2015-05-06T14:58:17.011]   topo_cpus_bitmap[0]:NULL
[2015-05-06T14:58:17.011]   topo_gres_bitmap[0]:0-7
[2015-05-06T14:58:17.011]   topo_gres_cnt_alloc[0]:0
[2015-05-06T14:58:17.011]   topo_gres_cnt_avail[0]:8
[2015-05-06T14:58:17.011]   type[0]:tesla
[2015-05-06T14:58:17.011]   type_cnt_alloc[0]:0
[2015-05-06T14:58:17.011]   type_cnt_avail[0]:8
[2015-05-06T14:58:17.011]   type[0]:tesla
[2015-05-06T14:58:17.011] gres/ram: state for smurf02
[2015-05-06T14:58:17.011]   gres_cnt found:48 configured:48 avail:48 alloc:0
[2015-05-06T14:58:17.011]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.011]   gres_used:(null)
[2015-05-06T14:58:17.011] gres/gram: state for smurf02
[2015-05-06T14:58:17.011]   gres_cnt found:6000 configured:6000 avail:6000 
no_consume
[2015-05-06T14:58:17.011]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.011]   gres_used:(null)
[2015-05-06T14:58:17.011] gres/scratch: state for smurf02
[2015-05-06T14:58:17.011]   gres_cnt found:1300 configured:1300 avail:1300 
alloc:0
[2015-05-06T14:58:17.011]   gres_bit_alloc:NULL
[2015-05-06T14:58:17.011]   gres_used:(null)
[2015-05-06T14:58:17.011] debug:  validate_node_specs: node smurf02 registered 
with 0 jobs
[2015-05-06T14:58:18.013] 
SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=4,partition_job_depth=0
[2015-05-06T14:58:18.013] debug:  sched: Running job scheduler
Name=gpu Type=tesla File=/dev/nvidia0

Name=gpu Type=tesla File=/dev/nvidia1

Name=gpu Type=tesla File=/dev/nvidia2

Name=gpu Type=tesla File=/dev/nvidia3

Name=gpu Type=tesla File=/dev/nvidia4

Name=gpu Type=tesla File=/dev/nvidia5

Name=gpu Type=tesla File=/dev/nvidia6

Name=gpu Type=tesla File=/dev/nvidia7

Name=ram Count=48

Name=gram Count=6000

Name=scratch Count=1300
[2015-05-06T14:52:52.502] debug:  init: Gres GPU plugin loaded
[2015-05-06T14:52:52.502] debug:  gres: Couldn't find the specified plugin name 
for gres/ram looking at all files
[2015-05-06T14:52:52.503] debug:  Cannot find plugin of type gres/ram, just 
track gres counts
[2015-05-06T14:52:52.503] debug:  gres: Couldn't find the specified plugin name 
for gres/gram looking at all files
[2015-05-06T14:52:52.503] debug:  Cannot find plugin of type gres/gram, just 
track gres counts
[2015-05-06T14:52:52.503] debug:  gres: Couldn't find the specified plugin name 
for gres/scratch looking at all files
[2015-05-06T14:52:52.503] debug:  Cannot find plugin of type gres/scratch, just 
track gres counts
[2015-05-06T14:52:52.504] Gres Name=gpu Type=tesla Count=1 ID=7696487 
File=/dev/nvidia0
[2015-05-06T14:52:52.504] Gres Name=gpu Type=tesla Count=1 ID=7696487 
File=/dev/nvidia1
[2015-05-06T14:52:52.504] Gres Name=gpu Type=tesla Count=1 ID=7696487 
File=/dev/nvidia2
[2015-05-06T14:52:52.504] Gres Name=gpu Type=tesla Count=1 ID=7696487 
File=/dev/nvidia3
[2015-05-06T14:52:52.504] Gres Name=gpu Type=tesla Count=1 ID=7696487 
File=/dev/nvidia4
[2015-05-06T14:52:52.504] Gres Name=gpu Type=tesla Count=1 ID=7696487 
File=/dev/nvidia5
[2015-05-06T14:52:52.504] Gres Name=gpu Type=tesla Count=1 ID=7696487 
File=/dev/nvidia6
[2015-05-06T14:52:52.504] Gres Name=gpu Type=tesla Count=1 ID=7696487 
File=/dev/nvidia7
[2015-05-06T14:52:52.504] Gres Name=ram Type=(null) Count=48 ID=7168370
[2015-05-06T14:52:52.504] Gres Name=gram Type=(null) Count=6000 ID=1835102823
[2015-05-06T14:52:52.504] Gres Name=scratch Type=(null) Count=1300 ID=1641727719
[2015-05-06T14:52:52.504] gpu 0 is device number 0
[2015-05-06T14:52:52.504] gpu 1 is device number 1
[2015-05-06T14:52:52.504] gpu 2 is device number 2
[2015-05-06T14:52:52.504] gpu 3 is device number 3
[2015-05-06T14:52:52.504] gpu 4 is device number 4
[2015-05-06T14:52:52.504] gpu 5 is device number 5
[2015-05-06T14:52:52.504] gpu 6 is device number 6
[2015-05-06T14:52:52.504] gpu 7 is device number 7
[2015-05-06T14:52:52.504] topology NONE plugin loaded
[2015-05-06T14:52:52.504] route default plugin loaded
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:0 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:1 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:2 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:3 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:4 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:5 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:6 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:7 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:8 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:9 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:10 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.505] debug:  cpu_freq_init: CPU:11 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:12 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:13 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:14 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:15 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:16 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:17 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:18 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:19 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:20 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:21 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:22 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] debug:  cpu_freq_init: CPU:23 reset_freq:1600000 
avail_gov:1f orig_governor:conservative
[2015-05-06T14:52:52.506] No specialized cores configured by default on this 
node
[2015-05-06T14:52:52.506] Resource spec: system memory limit not configured for 
this node
[2015-05-06T14:52:52.507] debug:  Reading cgroup.conf file 
/etc/slurm/cgroup.conf
[2015-05-06T14:52:52.507] debug:  Reading cgroup.conf file 
/etc/slurm/cgroup.conf
[2015-05-06T14:52:52.508] debug:  task/cgroup: now constraining jobs allocated 
cores
[2015-05-06T14:52:52.508] debug:  task/cgroup: loaded
[2015-05-06T14:52:52.508] debug:  auth plugin for Munge 
(http://code.google.com/p/munge/) loaded
[2015-05-06T14:52:52.508] debug:  spank: opening plugin stack 
/etc/slurm/plugstack.conf
[2015-05-06T14:52:52.508] Munge cryptographic signature plugin loaded
[2015-05-06T14:52:52.509] Warning: Core limit is only 0 KB
[2015-05-06T14:52:52.509] slurmd version 14.11.5 started
[2015-05-06T14:52:52.509] debug:  Job accounting gather LINUX plugin loaded
[2015-05-06T14:52:52.509] debug:  job_container none plugin loaded
[2015-05-06T14:52:52.510] debug:  switch NONE plugin loaded
[2015-05-06T14:52:52.510] slurmd started on Wed, 06 May 2015 14:52:52 +0200
[2015-05-06T14:52:52.512] CPUs=24 Boards=1 Sockets=2 Cores=6 Threads=2 
Memory=48128 TmpDisk=51175 Uptime=9544 CPUSpecList=(null)
[2015-05-06T14:52:52.512] debug:  AcctGatherEnergy NONE plugin loaded
[2015-05-06T14:52:52.512] debug:  AcctGatherProfile NONE plugin loaded
[2015-05-06T14:52:52.513] debug:  AcctGatherInfiniband NONE plugin loaded
[2015-05-06T14:52:52.513] debug:  AcctGatherFilesystem NONE plugin loaded


Reply via email to