Hi John,
DebugFlags=Gres has already been turned on. I raised the debug level to debug5, cleared the logfile, restarted daemons and tried to start a job with –gres=gpu:1. I attached the log files but I cannot find anything that points towards a reason why starting the jobs fails as long as gpu types are included. Regards Daniel Von: John Desantis [mailto:[email protected]] Gesendet: Mittwoch, 6. Mai 2015 23:43 An: slurm-dev Betreff: [slurm-dev] Re: slurm-dev Re: Job allocation for GPU jobs doesn't work using gpu plugin (node configuration not available) Daniel, Ok, at this point I'd suggest enabling the DebugFlags=Gres in your slurm.conf and turning up the SlurmctldDebug level to debug. You could also change SlurmdDebug to a higher debug level as well. There may be some clues in the extra output. John DeSantis 2015-05-06 16:57 GMT-04:00 Daniel Weber <[email protected] <mailto:[email protected]> >: Hi John, I replaced all gres.conf files with the „global“ gres.conf file containing information about every node and restarted the controller daemon as well as the slave daemons. The problem persists. Regards Daniel Weber Von: John Desantis [mailto: <mailto:[email protected]> [email protected]] Gesendet: Mittwoch, 6. Mai 2015 22:34 An: slurm-dev Betreff: [slurm-dev] Re: slurm-dev Re: Job allocation for GPU jobs doesn't work using gpu plugin (node configuration not available) Daniel, Use the same gres.conf on all nodes in the cluster (including the controller), and then restart slurm and try again. John DeSantis On May 6, 2015 4:22 PM, "Daniel Weber" < <mailto:[email protected]> [email protected]> wrote: Hi John, I added the types into slurm.conf and the gres.conf files on the nodes again and included a gres.conf on the controller node - without any success. Slurm rejects jobs with "--gres=gpu:1" or "--gres=gpu:tesla:1". slurm.conf NodeName=smurf01 NodeAddr=192.168.1.101 Feature="intel,fermi" Boards=1 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 Gres=gpu:tesla:8,ram:48,gram:no_consume:6000,scratch:1300 ... gres.conf on controller NodeName=smurf01 Name=gpu Type=tesla File=/dev/nvidia0 Count=1 NodeName=smurf01 Name=gpu Type=tesla File=/dev/nvidia1 Count=1 NodeName=smurf01 Name=gpu Type=tesla File=/dev/nvidia2 Count=1 NodeName=smurf01 Name=gpu Type=tesla File=/dev/nvidia3 Count=1 NodeName=smurf01 Name=gpu Type=tesla File=/dev/nvidia4 Count=1 NodeName=smurf01 Name=gpu Type=tesla File=/dev/nvidia5 Count=1 NodeName=smurf01 Name=gpu Type=tesla File=/dev/nvidia6 Count=1 NodeName=smurf01 Name=gpu Type=tesla File=/dev/nvidia7 Count=1 NodeName=smurf01 Name=ram Count=48 NodeName=smurf01 Name=gram Count=6000 NodeName=smurf01 Name=scratch Count=1300 ... gres.conf on smurf01 Name=gpu Type=tesla File=/dev/nvidia0 Count=1 Name=gpu Type=tesla File=/dev/nvidia1 Count=1 Name=gpu Type=tesla File=/dev/nvidia2 Count=1 Name=gpu Type=tesla File=/dev/nvidia3 Count=1 Name=gpu Type=tesla File=/dev/nvidia4 Count=1 Name=gpu Type=tesla File=/dev/nvidia5 Count=1 Name=gpu Type=tesla File=/dev/nvidia6 Count=1 Name=gpu Type=tesla File=/dev/nvidia7 Count=1 Name=ram Count=48 Name=gram Count=6000 Name=scratch Count=1300 Regards Daniel -----Ursprüngliche Nachricht----- Von: John Desantis [mailto: <mailto:[email protected]> [email protected]] Gesendet: Mittwoch, 6. Mai 2015 21:33 An: slurm-dev Betreff: [slurm-dev] Re: slurm-dev Re: Job allocation for GPU jobs doesn't work using gpu plugin (node configuration not available) Daniel, I hit send without completing my message: # gres.conf NodeName=blah Name=gpu Type=Tesla-T10 File=/dev/nvidia[0-1] HTH. John DeSantis 2015-05-06 15:30 GMT-04:00 John Desantis < <mailto:[email protected]> [email protected]>: > Daniel, > > You sparked an interest. > > I was able to get Gres Types working by: > > 1.) Ensuring that the type was defined in slurm.conf for the nodes in > question; > 2.) Ensuring that the global gres.conf respected the type. > > salloc -n 1 --gres=gpu:Tesla-T10:1 > salloc: Pending job allocation 532507 > salloc: job 532507 queued and waiting for resources > > # slurm.conf > Nodename=blah CPUs=16 CoresPerSocket=4 Sockets=4 RealMemory=129055 > Feature=ib_ddr,ib_ofa,sse,sse2,sse3,tpa,cpu_xeon,xeon_E7330,gpu_T10,ti > tan,mem_128G > Gres=gpu:Tesla-T10:2 Weight=1000 > > # gres.conf > > > 2015-05-06 15:25 GMT-04:00 John Desantis < <mailto:[email protected]> > [email protected]>: >> >> Daniel, >> >> "I can handle that temporarily with node features instead but I'd >> prefer utilizing the gpu types." >> >> Guilty of reading your response too quickly... >> >> John DeSantis >> >> 2015-05-06 15:22 GMT-04:00 John Desantis < <mailto:[email protected]> >> [email protected]>: >>> Daniel, >>> >>> Instead of defining the GPU type in our Gres configuration (global >>> with hostnames, no count), we simply add a feature so that users can >>> request a GPU (or GPU's) via Gres and the specific model via a >>> constraint. This may help out the situation so that your users can >>> request a specific GPU model. >>> >>> --srun --gres=gpu:1 -C "gpu_k20" >>> >>> I didn't think of it at the time, but I remember running --gres=help >>> when initially setting up GPU's to help rule out errors. I don't >>> know if you ran that command or not, but it's worth a shot to verify >>> that Gres types are being seen correctly on a node by the >>> controller. I also wonder if using a cluster wide Gres definition >>> (vs. only on nodes in question) would make a difference or not. >>> >>> John DeSantis >>> >>> >>> 2015-05-06 15:12 GMT-04:00 Daniel Weber < >>> <mailto:[email protected]> [email protected]>: >>>> >>>> Hi John, >>>> >>>> I already tried using "Count=1" for each line as well as "Count=8" for a >>>> single configuration line as well. >>>> >>>> I "solved" (or better circumvented) the problem by removing the "Type=..." >>>> specifications from the "gres.conf" files and from the slurm.conf. >>>> >>>> The jobs are running successfully without the possibility to request a >>>> certain GPU type. >>>> >>>> The generic resource examples on <http://schedmd.com> schedmd.com >>>> explicitly show the "Type" specifications on GPUs and I really would like >>>> to use them. >>>> I can handle that temporarily with node features instead but I'd prefer >>>> utilizing the gpu types. >>>> >>>> Thank you for your help (and the hint into the right direction). >>>> >>>> Kind regards >>>> Daniel >>>> >>>> >>>> -----Ursprüngliche Nachricht----- >>>> Von: John Desantis [mailto:[email protected] >>>> <mailto:[email protected]> ] >>>> Gesendet: Mittwoch, 6. Mai 2015 18:16 >>>> An: slurm-dev >>>> Betreff: [slurm-dev] Re: slurm-dev Re: Job allocation for GPU jobs >>>> doesn't work using gpu plugin (node configuration not available) >>>> >>>> >>>> Daniel, >>>> >>>> What about a count? Try adding a count=1 after each of your GPU lines. >>>> >>>> John DeSantis >>>> >>>> 2015-05-06 11:54 GMT-04:00 Daniel Weber < >>>> <mailto:[email protected]> [email protected]>: >>>>> >>>>> The same "problem" occurs when using the grey type in the srun syntax >>>>> (using i.e. --gres=gpu:tesla:1). >>>>> >>>>> Regards, >>>>> Daniel >>>>> >>>>> -- >>>>> Von: John Desantis [mailto:[email protected] >>>>> <mailto:[email protected]> ] >>>>> Gesendet: Mittwoch, 6. Mai 2015 17:39 >>>>> An: slurm-dev >>>>> Betreff: [slurm-dev] Re: Job allocation for GPU jobs doesn't work >>>>> using gpu plugin (node configuration not available) >>>>> >>>>> >>>>> Daniel, >>>>> >>>>> We don't specify types in our Gres configuration, simply the resource. >>>>> >>>>> What happens if you update your srun syntax to: >>>>> >>>>> srun -n1 --gres=gpu:tesla:1 >>>>> >>>>> Does that dispatch the job? >>>>> >>>>> John DeSantis >>>>> >>>>> 2015-05-06 9:40 GMT-04:00 Daniel Weber < >>>>> <mailto:[email protected]> [email protected]>: >>>>>> Hello, >>>>>> >>>>>> currently I'm trying to set up SLURM on a gpu cluster with a >>>>>> small number of nodes (where smurf0[1-7] are the node names) >>>>>> using the gpu plugin to allocate jobs (requiring gpus). >>>>>> >>>>>> Unfortunately, when trying to run a gpu-job (any number of gpus; >>>>>> --gres=gpu:N), SLURM doesn't execute it, asserting unavailability >>>>>> of the requested configuration. >>>>>> I attached some logs and configuration text files in order to >>>>>> provide any information necessary to analyze this issue. >>>>>> >>>>>> Note: Cross posted here: <http://serverfault.com/questions/685258> >>>>>> http://serverfault.com/questions/685258 >>>>>> >>>>>> Example (using some test.sh which is echoing $CUDA_VISIBLE_DEVICES): >>>>>> >>>>>> srun -n1 --gres=gpu:1 test.sh >>>>>> --> srun: error: Unable to allocate resources: Requested >>>>>> node configuration is not available >>>>>> >>>>>> The slurmctld log for such calls shows: >>>>>> >>>>>> gres: gpu state for job X >>>>>> gres_cnt:1 node_cnt:1 type:(null) >>>>>> _pick_best_nodes: job X never runnable >>>>>> _slurm_rpc_allocate_resources: Requested node >>>>>> configuration is not available >>>>>> >>>>>> Jobs with any other type of configured generic resource complete >>>>>> successfully: >>>>>> >>>>>> srun -n1 --gres=gram:500 test.sh >>>>>> --> CUDA_VISIBLE_DEVICES=NoDevFiles >>>>>> >>>>>> The nodes and gres configuration in slurm.conf (which is attached >>>>>> as >>>>>> well) are like: >>>>>> >>>>>> GresTypes=gpu,ram,gram,scratch >>>>>> ... >>>>>> NodeName=smurf01 NodeAddr=192.168.1.101 Feature="intel,fermi" >>>>>> Boards=1 >>>>>> SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=2 >>>>>> Gres=gpu:tesla:8,ram:48,gram:no_consume:6000,scratch:1300 >>>>>> NodeName=smurf02 NodeAddr=192.168.1.102 Feature="intel,fermi" >>>>>> Boards=1 >>>>>> SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=1 >>>>>> Gres=gpu:tesla:8,ram:48,gram:no_consume:6000,scratch:1300 >>>>>> >>>>>> The respective gres.conf files are >>>>>> Name=gpu Count=8 Type=tesla File=/dev/nvidia[0-7] >>>>>> Name=ram Count=48 >>>>>> Name=gram Count=6000 >>>>>> Name=scratch Count=1300 >>>>>> >>>>>> The output of "scontrol show node" lists all the nodes with the >>>>>> correct gres configuration i.e.: >>>>>> >>>>>> NodeName=smurf01 Arch=x86_64 CoresPerSocket=6 >>>>>> CPUAlloc=0 CPUErr=0 CPUTot=24 CPULoad=0.01 Features=intel,fermi >>>>>> Gres=gpu:tesla:8,ram:48,gram:no_consume:6000,scratch:1300 >>>>>> ...etc. >>>>>> >>>>>> As far as I can tell, the slurmd daemon on the nodes recognizes >>>>>> the gpus (and other generic resources) correctly. >>>>>> >>>>>> My slurmd.log on node smurf01 says >>>>>> >>>>>> Gres Name = gpu Type = tesla Count = 8 ID = 7696487 File = >>>>>> /dev >>>>>> /nvidia[0 - 7] >>>>>> >>>>>> The log for slurmctld shows >>>>>> >>>>>> gres / gpu: state for smurf01 >>>>>> gres_cnt found : 8 configured : 8 avail : 8 alloc : 0 >>>>>> gres_bit_alloc : >>>>>> gres_used : (null) >>>>>> >>>>>> I can't figure out why the controller node states that jobs using >>>>>> --gres=gpu:N are "never runnable" and why "the requested node >>>>>> configuration is not available". >>>>>> Any help is appreciated. >>>>>> >>>>>> Kind regards, >>>>>> Daniel Weber >>>>>> >>>>>> PS: If further information is required, don't hesitate to ask.
[2015-05-07T00:01:24.878] debug: sched: slurmctld starting [2015-05-07T00:01:24.879] debug3: Version in last_conf_lite header is 7168 [2015-05-07T00:01:24.880] slurmctld version 14.11.5 started on cluster egpc [2015-05-07T00:01:24.880] debug3: Trying to load plugin /usr/lib64/slurm/crypto_munge.so [2015-05-07T00:01:24.881] Munge cryptographic signature plugin loaded [2015-05-07T00:01:24.881] debug3: Success. [2015-05-07T00:01:24.881] debug3: Trying to load plugin /usr/lib64/slurm/gres_gpu.so [2015-05-07T00:01:24.881] debug: init: Gres GPU plugin loaded [2015-05-07T00:01:24.881] debug3: Success. [2015-05-07T00:01:24.881] debug3: Trying to load plugin /usr/lib64/slurm/gres_ram.so [2015-05-07T00:01:24.881] debug4: /usr/lib64/slurm/gres_ram.so: Does not exist or not a regular file. [2015-05-07T00:01:24.881] debug: gres: Couldn't find the specified plugin name for gres/ram looking at all files [2015-05-07T00:01:24.881] debug: Cannot find plugin of type gres/ram, just track gres counts [2015-05-07T00:01:24.881] debug3: Trying to load plugin /usr/lib64/slurm/gres_gram.so [2015-05-07T00:01:24.881] debug4: /usr/lib64/slurm/gres_gram.so: Does not exist or not a regular file. [2015-05-07T00:01:24.881] debug: gres: Couldn't find the specified plugin name for gres/gram looking at all files [2015-05-07T00:01:24.882] debug: Cannot find plugin of type gres/gram, just track gres counts [2015-05-07T00:01:24.882] debug3: Trying to load plugin /usr/lib64/slurm/gres_scratch.so [2015-05-07T00:01:24.882] debug4: /usr/lib64/slurm/gres_scratch.so: Does not exist or not a regular file. [2015-05-07T00:01:24.882] debug: gres: Couldn't find the specified plugin name for gres/scratch looking at all files [2015-05-07T00:01:24.882] debug: Cannot find plugin of type gres/scratch, just track gres counts [2015-05-07T00:01:24.882] debug3: Trying to load plugin /usr/lib64/slurm/select_linear.so [2015-05-07T00:01:24.882] debug3: Success. [2015-05-07T00:01:24.882] debug3: Trying to load plugin /usr/lib64/slurm/preempt_none.so [2015-05-07T00:01:24.882] preempt/none loaded [2015-05-07T00:01:24.882] debug3: Success. [2015-05-07T00:01:24.882] debug3: Trying to load plugin /usr/lib64/slurm/checkpoint_none.so [2015-05-07T00:01:24.882] debug3: Success. [2015-05-07T00:01:24.882] debug: Checkpoint plugin loaded: checkpoint/none [2015-05-07T00:01:24.882] debug3: Trying to load plugin /usr/lib64/slurm/acct_gather_energy_none.so [2015-05-07T00:01:24.882] debug: AcctGatherEnergy NONE plugin loaded [2015-05-07T00:01:24.882] debug3: Success. [2015-05-07T00:01:24.882] debug3: Trying to load plugin /usr/lib64/slurm/acct_gather_profile_none.so [2015-05-07T00:01:24.882] debug: AcctGatherProfile NONE plugin loaded [2015-05-07T00:01:24.882] debug3: Success. [2015-05-07T00:01:24.882] debug3: Trying to load plugin /usr/lib64/slurm/acct_gather_infiniband_none.so [2015-05-07T00:01:24.882] debug: AcctGatherInfiniband NONE plugin loaded [2015-05-07T00:01:24.882] debug3: Success. [2015-05-07T00:01:24.882] debug3: Trying to load plugin /usr/lib64/slurm/acct_gather_filesystem_none.so [2015-05-07T00:01:24.883] debug: AcctGatherFilesystem NONE plugin loaded [2015-05-07T00:01:24.883] debug3: Success. [2015-05-07T00:01:24.883] debug2: No acct_gather.conf file (/etc/slurm/acct_gather.conf) [2015-05-07T00:01:24.883] debug3: Trying to load plugin /usr/lib64/slurm/jobacct_gather_linux.so [2015-05-07T00:01:24.883] debug: Job accounting gather LINUX plugin loaded [2015-05-07T00:01:24.883] debug3: Success. [2015-05-07T00:01:24.883] debug3: Trying to load plugin /usr/lib64/slurm/ext_sensors_none.so [2015-05-07T00:01:24.883] ExtSensors NONE plugin loaded [2015-05-07T00:01:24.883] debug3: Success. [2015-05-07T00:01:24.883] debug3: Trying to load plugin /usr/lib64/slurm/switch_none.so [2015-05-07T00:01:24.883] debug: switch NONE plugin loaded [2015-05-07T00:01:24.883] debug3: Success. [2015-05-07T00:01:24.883] debug: No backup controller to shutdown [2015-05-07T00:01:24.883] debug3: Trying to load plugin /usr/lib64/slurm/accounting_storage_slurmdbd.so [2015-05-07T00:01:24.883] Accounting storage SLURMDBD plugin loaded with AuthInfo=(null) [2015-05-07T00:01:24.883] debug3: Success. [2015-05-07T00:01:24.883] debug4: Accounting storage SLURMDBD plugin loaded [2015-05-07T00:01:24.885] debug3: Trying to load plugin /usr/lib64/slurm/auth_munge.so [2015-05-07T00:01:24.885] debug: auth plugin for Munge (http://code.google.com/p/munge/) loaded [2015-05-07T00:01:24.885] debug3: Success. [2015-05-07T00:01:24.887] debug: slurmdbd: Sent DbdInit msg [2015-05-07T00:01:24.887] slurmdbd: recovered 0 pending RPCs [2015-05-07T00:01:25.130] debug2: assoc 3(akes, (null)) has direct parent of 1(root, (null)) [2015-05-07T00:01:25.130] debug2: assoc 6(mat, (null)) has direct parent of 3(akes, (null)) [2015-05-07T00:01:25.130] debug2: assoc 5(dev, (null)) has direct parent of 3(akes, (null)) [2015-05-07T00:01:25.130] debug2: assoc 4(bio, (null)) has direct parent of 3(akes, (null)) [2015-05-07T00:01:25.130] debug2: assoc 2(root, root) has direct parent of 1(root, (null)) [2015-05-07T00:01:25.130] debug2: user root default acct is root [2015-05-07T00:01:25.130] debug3: assoc 3(akes (null)) normalize = 0.500000 from 3(akes (null)) 1 / 2 = 0.500000 [2015-05-07T00:01:25.130] debug3: assoc 6(mat (null)) normalize = 0.333333 from 6(mat (null)) 1 / 3 = 0.333333 [2015-05-07T00:01:25.130] debug3: assoc 6(mat (null)) normalize = 0.166667 from 3(akes (null)) 1 / 2 = 0.500000 [2015-05-07T00:01:25.130] debug3: assoc 5(dev (null)) normalize = 0.333333 from 5(dev (null)) 1 / 3 = 0.333333 [2015-05-07T00:01:25.130] debug3: assoc 5(dev (null)) normalize = 0.166667 from 3(akes (null)) 1 / 2 = 0.500000 [2015-05-07T00:01:25.130] debug3: assoc 4(bio (null)) normalize = 0.333333 from 4(bio (null)) 1 / 3 = 0.333333 [2015-05-07T00:01:25.130] debug3: assoc 4(bio (null)) normalize = 0.166667 from 3(akes (null)) 1 / 2 = 0.500000 [2015-05-07T00:01:25.130] debug3: assoc 2(root root) normalize = 0.500000 from 2(root root) 1 / 2 = 0.500000 [2015-05-07T00:01:25.211] debug3: Version in assoc_mgr_state header is 1 [2015-05-07T00:01:25.211] debug3: Version in assoc_mgr_state header is 1 [2015-05-07T00:01:25.211] debug: Reading slurm.conf file: /etc/slurm/slurm.conf [2015-05-07T00:01:25.213] debug3: layouts: slurm_layouts_init()... [2015-05-07T00:01:25.213] layouts: no layout to initialize [2015-05-07T00:01:25.213] debug3: Trying to load plugin /usr/lib64/slurm/topology_none.so [2015-05-07T00:01:25.213] topology NONE plugin loaded [2015-05-07T00:01:25.213] debug3: Success. [2015-05-07T00:01:25.213] debug: No DownNodes [2015-05-07T00:01:25.213] debug2: partition work does not allow root jobs [2015-05-07T00:01:25.213] debug3: Trying to load plugin /usr/lib64/slurm/jobcomp_none.so [2015-05-07T00:01:25.213] debug3: Success. [2015-05-07T00:01:25.213] debug3: Trying to load plugin /usr/lib64/slurm/sched_backfill.so [2015-05-07T00:01:25.213] sched: Backfill scheduler plugin loaded [2015-05-07T00:01:25.213] debug3: Success. [2015-05-07T00:01:25.213] debug3: Trying to load plugin /usr/lib64/slurm/route_default.so [2015-05-07T00:01:25.213] route default plugin loaded [2015-05-07T00:01:25.214] debug3: Success. [2015-05-07T00:01:25.214] layouts: loading entities/relations information [2015-05-07T00:01:25.214] debug3: layouts: loading node smurf01 [2015-05-07T00:01:25.214] debug3: layouts: loading node smurf02 [2015-05-07T00:01:25.214] debug3: layouts: loading node smurf03 [2015-05-07T00:01:25.214] debug3: layouts: loading node smurf04 [2015-05-07T00:01:25.214] debug3: layouts: loading node smurf05 [2015-05-07T00:01:25.214] debug3: layouts: loading node smurf06 [2015-05-07T00:01:25.214] debug3: layouts: loading node smurf07 [2015-05-07T00:01:25.214] debug: layouts: 7/7 nodes in hash table, rc=0 [2015-05-07T00:01:25.214] debug: layouts: loading stage 1 [2015-05-07T00:01:25.214] debug: layouts: loading stage 2 [2015-05-07T00:01:25.214] debug3: Version string in node_state header is PROTOCOL_VERSION [2015-05-07T00:01:25.214] Recovered state of 7 nodes [2015-05-07T00:01:25.214] debug3: Version string in job_state header is PROTOCOL_VERSION [2015-05-07T00:01:25.214] debug3: Job id in job_state header is 156 [2015-05-07T00:01:25.214] debug3: Set job_id_sequence to 156 [2015-05-07T00:01:25.214] Recovered information about 0 jobs [2015-05-07T00:01:25.214] init_requeue_policy: kill_invalid_depend is set to 0 [2015-05-07T00:01:25.214] gres/gpu: state for smurf01 [2015-05-07T00:01:25.214] gres_cnt found:TBD configured:8 avail:8 alloc:0 [2015-05-07T00:01:25.214] gres_bit_alloc: [2015-05-07T00:01:25.214] gres_used:(null) [2015-05-07T00:01:25.214] type_cnt_alloc[0]:0 [2015-05-07T00:01:25.214] type_cnt_avail[0]:8 [2015-05-07T00:01:25.214] type[0]:tesla [2015-05-07T00:01:25.214] gres/ram: state for smurf01 [2015-05-07T00:01:25.214] gres_cnt found:TBD configured:48 avail:48 alloc:0 [2015-05-07T00:01:25.214] gres_bit_alloc:NULL [2015-05-07T00:01:25.214] gres_used:(null) [2015-05-07T00:01:25.214] gres/gram: state for smurf01 [2015-05-07T00:01:25.214] gres_cnt found:TBD configured:6000 avail:6000 no_consume [2015-05-07T00:01:25.214] gres_bit_alloc:NULL [2015-05-07T00:01:25.214] gres_used:(null) [2015-05-07T00:01:25.214] gres/scratch: state for smurf01 [2015-05-07T00:01:25.214] gres_cnt found:TBD configured:1300 avail:1300 alloc:0 [2015-05-07T00:01:25.214] gres_bit_alloc:NULL [2015-05-07T00:01:25.214] gres_used:(null) [2015-05-07T00:01:25.214] gres/gpu: state for smurf02 [2015-05-07T00:01:25.214] gres_cnt found:TBD configured:8 avail:8 alloc:0 [2015-05-07T00:01:25.214] gres_bit_alloc: [2015-05-07T00:01:25.214] gres_used:(null) [2015-05-07T00:01:25.214] type_cnt_alloc[0]:0 [2015-05-07T00:01:25.214] type_cnt_avail[0]:8 [2015-05-07T00:01:25.214] type[0]:tesla [2015-05-07T00:01:25.214] gres/ram: state for smurf02 [2015-05-07T00:01:25.214] gres_cnt found:TBD configured:48 avail:48 alloc:0 [2015-05-07T00:01:25.214] gres_bit_alloc:NULL [2015-05-07T00:01:25.214] gres_used:(null) [2015-05-07T00:01:25.214] gres/gram: state for smurf02 [2015-05-07T00:01:25.214] gres_cnt found:TBD configured:6000 avail:6000 no_consume [2015-05-07T00:01:25.214] gres_bit_alloc:NULL [2015-05-07T00:01:25.214] gres_used:(null) [2015-05-07T00:01:25.214] gres/scratch: state for smurf02 [2015-05-07T00:01:25.214] gres_cnt found:TBD configured:1300 avail:1300 alloc:0 [2015-05-07T00:01:25.214] gres_bit_alloc:NULL [2015-05-07T00:01:25.214] gres_used:(null) [2015-05-07T00:01:25.214] gres/gpu: state for smurf03 [2015-05-07T00:01:25.214] gres_cnt found:TBD configured:3 avail:3 alloc:0 [2015-05-07T00:01:25.214] gres_bit_alloc: [2015-05-07T00:01:25.214] gres_used:(null) [2015-05-07T00:01:25.214] type_cnt_alloc[0]:0 [2015-05-07T00:01:25.214] type_cnt_avail[0]:3 [2015-05-07T00:01:25.214] type[0]:gtx [2015-05-07T00:01:25.214] gres/ram: state for smurf03 [2015-05-07T00:01:25.214] gres_cnt found:TBD configured:94 avail:94 alloc:0 [2015-05-07T00:01:25.214] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/gram: state for smurf03 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:1500 avail:1500 no_consume [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/scratch: state for smurf03 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:280 avail:280 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/gpu: state for smurf04 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:4 avail:4 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc: [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] type_cnt_alloc[0]:0 [2015-05-07T00:01:25.215] type_cnt_avail[0]:4 [2015-05-07T00:01:25.215] type[0]:gtx [2015-05-07T00:01:25.215] gres/ram: state for smurf04 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:94 avail:94 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/gram: state for smurf04 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:1500 avail:1500 no_consume [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/scratch: state for smurf04 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:280 avail:280 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/gpu: state for smurf05 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:4 avail:4 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc: [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] type_cnt_alloc[0]:0 [2015-05-07T00:01:25.215] type_cnt_avail[0]:4 [2015-05-07T00:01:25.215] type[0]:gtx [2015-05-07T00:01:25.215] gres/ram: state for smurf05 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:256 avail:256 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/gram: state for smurf05 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:6000 avail:6000 no_consume [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/scratch: state for smurf05 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:2400 avail:2400 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/gpu: state for smurf06 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:2 avail:2 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc: [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] type_cnt_alloc[0]:0 [2015-05-07T00:01:25.215] type_cnt_avail[0]:2 [2015-05-07T00:01:25.215] type[0]:gtx [2015-05-07T00:01:25.215] gres/ram: state for smurf06 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:8 avail:8 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/gram: state for smurf06 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:1250 avail:1250 no_consume [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/scratch: state for smurf06 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:1800 avail:1800 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/gpu: state for smurf07 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:2 avail:2 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc: [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] type_cnt_alloc[0]:0 [2015-05-07T00:01:25.215] type_cnt_avail[0]:2 [2015-05-07T00:01:25.215] type[0]:gtx [2015-05-07T00:01:25.215] gres/ram: state for smurf07 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:16 avail:16 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/gram: state for smurf07 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:1250 avail:1250 no_consume [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] gres/scratch: state for smurf07 [2015-05-07T00:01:25.215] gres_cnt found:TBD configured:54 avail:54 alloc:0 [2015-05-07T00:01:25.215] gres_bit_alloc:NULL [2015-05-07T00:01:25.215] gres_used:(null) [2015-05-07T00:01:25.215] debug: Updating partition uid access list [2015-05-07T00:01:25.215] debug3: Version string in resv_state header is PROTOCOL_VERSION [2015-05-07T00:01:25.215] Recovered state of 0 reservations [2015-05-07T00:01:25.215] State of 0 triggers recovered [2015-05-07T00:01:25.215] read_slurm_conf: backup_controller not specified. [2015-05-07T00:01:25.215] Running as primary controller [2015-05-07T00:01:25.215] Registering slurmctld at port 6817 with slurmdbd. [2015-05-07T00:01:25.296] debug2: Sending cpu count of 94 for cluster [2015-05-07T00:01:25.376] debug3: Trying to load plugin /usr/lib64/slurm/priority_multifactor.so [2015-05-07T00:01:25.377] debug: Priority MULTIFACTOR plugin loaded [2015-05-07T00:01:25.377] debug3: Success. [2015-05-07T00:01:25.377] debug3: _slurmctld_rpc_mgr pid = 23712 [2015-05-07T00:01:25.377] debug3: _slurmctld_background pid = 23712 [2015-05-07T00:01:25.377] debug: power_save module disabled, SuspendTime < 0 [2015-05-07T00:01:25.377] debug2: slurmctld listening on 0.0.0.0:6817 [2015-05-07T00:01:25.378] debug4: done writing time 1430949685 [2015-05-07T00:01:25.468] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:25.468] gres/gpu: state for smurf02 [2015-05-07T00:01:25.468] gres_cnt found:8 configured:8 avail:8 alloc:0 [2015-05-07T00:01:25.468] gres_bit_alloc: [2015-05-07T00:01:25.468] gres_used:(null) [2015-05-07T00:01:25.468] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:25.468] topo_gres_bitmap[0]:0-7 [2015-05-07T00:01:25.468] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:25.468] topo_gres_cnt_avail[0]:8 [2015-05-07T00:01:25.468] type[0]:tesla [2015-05-07T00:01:25.468] type_cnt_alloc[0]:0 [2015-05-07T00:01:25.468] type_cnt_avail[0]:8 [2015-05-07T00:01:25.468] type[0]:tesla [2015-05-07T00:01:25.468] gres/ram: state for smurf02 [2015-05-07T00:01:25.468] gres_cnt found:48 configured:48 avail:48 alloc:0 [2015-05-07T00:01:25.468] gres_bit_alloc:NULL [2015-05-07T00:01:25.468] gres_used:(null) [2015-05-07T00:01:25.468] gres/gram: state for smurf02 [2015-05-07T00:01:25.468] gres_cnt found:6000 configured:6000 avail:6000 no_consume [2015-05-07T00:01:25.468] gres_bit_alloc:NULL [2015-05-07T00:01:25.468] gres_used:(null) [2015-05-07T00:01:25.468] gres/scratch: state for smurf02 [2015-05-07T00:01:25.468] gres_cnt found:1300 configured:1300 avail:1300 alloc:0 [2015-05-07T00:01:25.468] gres_bit_alloc:NULL [2015-05-07T00:01:25.468] gres_used:(null) [2015-05-07T00:01:25.468] debug: validate_node_specs: node smurf02 registered with 0 jobs [2015-05-07T00:01:25.468] debug2: _slurm_rpc_node_registration complete for smurf02 usec=314 [2015-05-07T00:01:25.819] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:25.819] gres/gpu: state for smurf03 [2015-05-07T00:01:25.819] gres_cnt found:3 configured:3 avail:3 alloc:0 [2015-05-07T00:01:25.819] gres_bit_alloc: [2015-05-07T00:01:25.819] gres_used:(null) [2015-05-07T00:01:25.819] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:25.819] topo_gres_bitmap[0]:0-2 [2015-05-07T00:01:25.819] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:25.819] topo_gres_cnt_avail[0]:3 [2015-05-07T00:01:25.819] type[0]:gtx [2015-05-07T00:01:25.819] type_cnt_alloc[0]:0 [2015-05-07T00:01:25.819] type_cnt_avail[0]:3 [2015-05-07T00:01:25.819] type[0]:gtx [2015-05-07T00:01:25.819] gres/ram: state for smurf03 [2015-05-07T00:01:25.819] gres_cnt found:94 configured:94 avail:94 alloc:0 [2015-05-07T00:01:25.819] gres_bit_alloc:NULL [2015-05-07T00:01:25.819] gres_used:(null) [2015-05-07T00:01:25.819] gres/gram: state for smurf03 [2015-05-07T00:01:25.819] gres_cnt found:1500 configured:1500 avail:1500 no_consume [2015-05-07T00:01:25.819] gres_bit_alloc:NULL [2015-05-07T00:01:25.819] gres_used:(null) [2015-05-07T00:01:25.819] gres/scratch: state for smurf03 [2015-05-07T00:01:25.819] gres_cnt found:280 configured:280 avail:280 alloc:0 [2015-05-07T00:01:25.819] gres_bit_alloc:NULL [2015-05-07T00:01:25.819] gres_used:(null) [2015-05-07T00:01:25.819] debug: validate_node_specs: node smurf03 registered with 0 jobs [2015-05-07T00:01:25.819] debug2: _slurm_rpc_node_registration complete for smurf03 usec=313 [2015-05-07T00:01:26.144] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:26.144] gres/gpu: state for smurf04 [2015-05-07T00:01:26.144] gres_cnt found:4 configured:4 avail:4 alloc:0 [2015-05-07T00:01:26.144] gres_bit_alloc: [2015-05-07T00:01:26.144] gres_used:(null) [2015-05-07T00:01:26.144] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:26.144] topo_gres_bitmap[0]:0-3 [2015-05-07T00:01:26.144] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:26.144] topo_gres_cnt_avail[0]:4 [2015-05-07T00:01:26.144] type[0]:gtx [2015-05-07T00:01:26.144] type_cnt_alloc[0]:0 [2015-05-07T00:01:26.144] type_cnt_avail[0]:4 [2015-05-07T00:01:26.144] type[0]:gtx [2015-05-07T00:01:26.144] gres/ram: state for smurf04 [2015-05-07T00:01:26.144] gres_cnt found:94 configured:94 avail:94 alloc:0 [2015-05-07T00:01:26.144] gres_bit_alloc:NULL [2015-05-07T00:01:26.144] gres_used:(null) [2015-05-07T00:01:26.144] gres/gram: state for smurf04 [2015-05-07T00:01:26.144] gres_cnt found:1500 configured:1500 avail:1500 no_consume [2015-05-07T00:01:26.144] gres_bit_alloc:NULL [2015-05-07T00:01:26.144] gres_used:(null) [2015-05-07T00:01:26.144] gres/scratch: state for smurf04 [2015-05-07T00:01:26.144] gres_cnt found:280 configured:280 avail:280 alloc:0 [2015-05-07T00:01:26.144] gres_bit_alloc:NULL [2015-05-07T00:01:26.144] gres_used:(null) [2015-05-07T00:01:26.144] debug: validate_node_specs: node smurf04 registered with 0 jobs [2015-05-07T00:01:26.144] debug2: _slurm_rpc_node_registration complete for smurf04 usec=286 [2015-05-07T00:01:26.227] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:26.227] gres/gpu: state for smurf01 [2015-05-07T00:01:26.227] gres_cnt found:8 configured:8 avail:8 alloc:0 [2015-05-07T00:01:26.227] gres_bit_alloc: [2015-05-07T00:01:26.227] gres_used:(null) [2015-05-07T00:01:26.227] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:26.227] topo_gres_bitmap[0]:0 [2015-05-07T00:01:26.227] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:26.227] topo_gres_cnt_avail[0]:1 [2015-05-07T00:01:26.227] type[0]:tesla [2015-05-07T00:01:26.227] topo_cpus_bitmap[1]:NULL [2015-05-07T00:01:26.227] topo_gres_bitmap[1]:1 [2015-05-07T00:01:26.227] topo_gres_cnt_alloc[1]:0 [2015-05-07T00:01:26.227] topo_gres_cnt_avail[1]:1 [2015-05-07T00:01:26.227] type[1]:tesla [2015-05-07T00:01:26.227] topo_cpus_bitmap[2]:NULL [2015-05-07T00:01:26.227] topo_gres_bitmap[2]:2 [2015-05-07T00:01:26.228] topo_gres_cnt_alloc[2]:0 [2015-05-07T00:01:26.228] topo_gres_cnt_avail[2]:1 [2015-05-07T00:01:26.228] type[2]:tesla [2015-05-07T00:01:26.228] topo_cpus_bitmap[3]:NULL [2015-05-07T00:01:26.228] topo_gres_bitmap[3]:3 [2015-05-07T00:01:26.228] topo_gres_cnt_alloc[3]:0 [2015-05-07T00:01:26.228] topo_gres_cnt_avail[3]:1 [2015-05-07T00:01:26.228] type[3]:tesla [2015-05-07T00:01:26.228] topo_cpus_bitmap[4]:NULL [2015-05-07T00:01:26.228] topo_gres_bitmap[4]:4 [2015-05-07T00:01:26.228] topo_gres_cnt_alloc[4]:0 [2015-05-07T00:01:26.228] topo_gres_cnt_avail[4]:1 [2015-05-07T00:01:26.228] type[4]:tesla [2015-05-07T00:01:26.228] topo_cpus_bitmap[5]:NULL [2015-05-07T00:01:26.228] topo_gres_bitmap[5]:5 [2015-05-07T00:01:26.228] topo_gres_cnt_alloc[5]:0 [2015-05-07T00:01:26.228] topo_gres_cnt_avail[5]:1 [2015-05-07T00:01:26.228] type[5]:tesla [2015-05-07T00:01:26.228] topo_cpus_bitmap[6]:NULL [2015-05-07T00:01:26.228] topo_gres_bitmap[6]:6 [2015-05-07T00:01:26.228] topo_gres_cnt_alloc[6]:0 [2015-05-07T00:01:26.228] topo_gres_cnt_avail[6]:1 [2015-05-07T00:01:26.228] type[6]:tesla [2015-05-07T00:01:26.228] topo_cpus_bitmap[7]:NULL [2015-05-07T00:01:26.228] topo_gres_bitmap[7]:7 [2015-05-07T00:01:26.228] topo_gres_cnt_alloc[7]:0 [2015-05-07T00:01:26.228] topo_gres_cnt_avail[7]:1 [2015-05-07T00:01:26.228] type[7]:tesla [2015-05-07T00:01:26.228] type_cnt_alloc[0]:0 [2015-05-07T00:01:26.228] type_cnt_avail[0]:8 [2015-05-07T00:01:26.228] type[0]:tesla [2015-05-07T00:01:26.228] gres/ram: state for smurf01 [2015-05-07T00:01:26.228] gres_cnt found:48 configured:48 avail:48 alloc:0 [2015-05-07T00:01:26.228] gres_bit_alloc:NULL [2015-05-07T00:01:26.228] gres_used:(null) [2015-05-07T00:01:26.228] gres/gram: state for smurf01 [2015-05-07T00:01:26.228] gres_cnt found:6000 configured:6000 avail:6000 no_consume [2015-05-07T00:01:26.228] gres_bit_alloc:NULL [2015-05-07T00:01:26.228] gres_used:(null) [2015-05-07T00:01:26.228] gres/scratch: state for smurf01 [2015-05-07T00:01:26.228] gres_cnt found:1300 configured:1300 avail:1300 alloc:0 [2015-05-07T00:01:26.228] gres_bit_alloc:NULL [2015-05-07T00:01:26.228] gres_used:(null) [2015-05-07T00:01:26.228] debug: validate_node_specs: node smurf01 registered with 0 jobs [2015-05-07T00:01:26.228] debug2: _slurm_rpc_node_registration complete for smurf01 usec=598 [2015-05-07T00:01:26.551] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:26.551] gres/gpu: state for smurf05 [2015-05-07T00:01:26.551] gres_cnt found:4 configured:4 avail:4 alloc:0 [2015-05-07T00:01:26.551] gres_bit_alloc: [2015-05-07T00:01:26.551] gres_used:(null) [2015-05-07T00:01:26.551] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:26.551] topo_gres_bitmap[0]:0-3 [2015-05-07T00:01:26.551] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:26.551] topo_gres_cnt_avail[0]:4 [2015-05-07T00:01:26.551] type[0]:gtx [2015-05-07T00:01:26.551] type_cnt_alloc[0]:0 [2015-05-07T00:01:26.551] type_cnt_avail[0]:4 [2015-05-07T00:01:26.551] type[0]:gtx [2015-05-07T00:01:26.551] gres/ram: state for smurf05 [2015-05-07T00:01:26.551] gres_cnt found:256 configured:256 avail:256 alloc:0 [2015-05-07T00:01:26.551] gres_bit_alloc:NULL [2015-05-07T00:01:26.551] gres_used:(null) [2015-05-07T00:01:26.551] gres/gram: state for smurf05 [2015-05-07T00:01:26.551] gres_cnt found:6000 configured:6000 avail:6000 no_consume [2015-05-07T00:01:26.551] gres_bit_alloc:NULL [2015-05-07T00:01:26.551] gres_used:(null) [2015-05-07T00:01:26.551] gres/scratch: state for smurf05 [2015-05-07T00:01:26.551] gres_cnt found:2400 configured:2400 avail:2400 alloc:0 [2015-05-07T00:01:26.551] gres_bit_alloc:NULL [2015-05-07T00:01:26.552] gres_used:(null) [2015-05-07T00:01:26.552] debug: validate_node_specs: node smurf05 registered with 0 jobs [2015-05-07T00:01:26.552] debug2: _slurm_rpc_node_registration complete for smurf05 usec=408 [2015-05-07T00:01:26.823] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:26.823] gres/gpu: state for smurf06 [2015-05-07T00:01:26.823] gres_cnt found:2 configured:2 avail:2 alloc:0 [2015-05-07T00:01:26.823] gres_bit_alloc: [2015-05-07T00:01:26.823] gres_used:(null) [2015-05-07T00:01:26.823] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:26.823] topo_gres_bitmap[0]:0-1 [2015-05-07T00:01:26.823] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:26.823] topo_gres_cnt_avail[0]:2 [2015-05-07T00:01:26.823] type[0]:gtx [2015-05-07T00:01:26.823] type_cnt_alloc[0]:0 [2015-05-07T00:01:26.823] type_cnt_avail[0]:2 [2015-05-07T00:01:26.823] type[0]:gtx [2015-05-07T00:01:26.823] gres/ram: state for smurf06 [2015-05-07T00:01:26.823] gres_cnt found:8 configured:8 avail:8 alloc:0 [2015-05-07T00:01:26.823] gres_bit_alloc:NULL [2015-05-07T00:01:26.823] gres_used:(null) [2015-05-07T00:01:26.823] gres/gram: state for smurf06 [2015-05-07T00:01:26.823] gres_cnt found:1250 configured:1250 avail:1250 no_consume [2015-05-07T00:01:26.823] gres_bit_alloc:NULL [2015-05-07T00:01:26.823] gres_used:(null) [2015-05-07T00:01:26.823] gres/scratch: state for smurf06 [2015-05-07T00:01:26.823] gres_cnt found:1800 configured:1800 avail:1800 alloc:0 [2015-05-07T00:01:26.823] gres_bit_alloc:NULL [2015-05-07T00:01:26.823] gres_used:(null) [2015-05-07T00:01:26.823] debug: validate_node_specs: node smurf06 registered with 0 jobs [2015-05-07T00:01:26.823] debug2: _slurm_rpc_node_registration complete for smurf06 usec=323 [2015-05-07T00:01:27.389] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:27.389] gres/gpu: state for smurf07 [2015-05-07T00:01:27.389] gres_cnt found:2 configured:2 avail:2 alloc:0 [2015-05-07T00:01:27.389] gres_bit_alloc: [2015-05-07T00:01:27.389] gres_used:(null) [2015-05-07T00:01:27.389] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:27.390] topo_gres_bitmap[0]:0-1 [2015-05-07T00:01:27.390] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:27.390] topo_gres_cnt_avail[0]:2 [2015-05-07T00:01:27.390] type[0]:gtx [2015-05-07T00:01:27.390] type_cnt_alloc[0]:0 [2015-05-07T00:01:27.390] type_cnt_avail[0]:2 [2015-05-07T00:01:27.390] type[0]:gtx [2015-05-07T00:01:27.390] gres/ram: state for smurf07 [2015-05-07T00:01:27.390] gres_cnt found:16 configured:16 avail:16 alloc:0 [2015-05-07T00:01:27.390] gres_bit_alloc:NULL [2015-05-07T00:01:27.390] gres_used:(null) [2015-05-07T00:01:27.390] gres/gram: state for smurf07 [2015-05-07T00:01:27.390] gres_cnt found:1250 configured:1250 avail:1250 no_consume [2015-05-07T00:01:27.390] gres_bit_alloc:NULL [2015-05-07T00:01:27.390] gres_used:(null) [2015-05-07T00:01:27.390] gres/scratch: state for smurf07 [2015-05-07T00:01:27.390] gres_cnt found:54 configured:54 avail:54 alloc:0 [2015-05-07T00:01:27.390] gres_bit_alloc:NULL [2015-05-07T00:01:27.390] gres_used:(null) [2015-05-07T00:01:27.390] debug: validate_node_specs: node smurf07 registered with 0 jobs [2015-05-07T00:01:27.390] debug2: _slurm_rpc_node_registration complete for smurf07 usec=304 [2015-05-07T00:01:28.392] debug: Spawning registration agent for smurf[01-07] 7 hosts [2015-05-07T00:01:28.392] debug2: Spawning RPC agent for msg_type REQUEST_NODE_REGISTRATION_STATUS [2015-05-07T00:01:28.392] SchedulerParameters=default_queue_depth=100,max_rpc_cnt=0,max_sched_time=4,partition_job_depth=0 [2015-05-07T00:01:28.392] debug: sched: Running job scheduler [2015-05-07T00:01:28.392] debug2: got 1 threads to send out [2015-05-07T00:01:28.392] debug3: Tree sending to smurf01 [2015-05-07T00:01:28.392] debug3: Tree sending to smurf02 [2015-05-07T00:01:28.392] debug3: Tree sending to smurf03 [2015-05-07T00:01:28.392] debug3: Tree sending to smurf04 [2015-05-07T00:01:28.393] debug2: Tree head got back 0 looking for 7 [2015-05-07T00:01:28.393] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:01:28.393] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:01:28.393] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:01:28.393] debug3: Tree sending to smurf06 [2015-05-07T00:01:28.393] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:01:28.393] debug3: Tree sending to smurf07 [2015-05-07T00:01:28.393] debug3: Tree sending to smurf05 [2015-05-07T00:01:28.393] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:01:28.393] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:01:28.394] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:01:28.394] debug2: Tree head got back 1 [2015-05-07T00:01:28.394] debug2: Tree head got back 2 [2015-05-07T00:01:28.394] debug2: Tree head got back 3 [2015-05-07T00:01:28.395] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:28.395] gres/gpu: state for smurf02 [2015-05-07T00:01:28.395] gres_cnt found:8 configured:8 avail:8 alloc:0 [2015-05-07T00:01:28.395] gres_bit_alloc: [2015-05-07T00:01:28.395] gres_used:(null) [2015-05-07T00:01:28.395] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:28.395] topo_gres_bitmap[0]:0-7 [2015-05-07T00:01:28.395] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:28.395] topo_gres_cnt_avail[0]:8 [2015-05-07T00:01:28.395] type[0]:tesla [2015-05-07T00:01:28.395] type_cnt_alloc[0]:0 [2015-05-07T00:01:28.395] type_cnt_avail[0]:8 [2015-05-07T00:01:28.395] type[0]:tesla [2015-05-07T00:01:28.395] gres/ram: state for smurf02 [2015-05-07T00:01:28.395] gres_cnt found:48 configured:48 avail:48 alloc:0 [2015-05-07T00:01:28.395] gres_bit_alloc:NULL [2015-05-07T00:01:28.395] gres_used:(null) [2015-05-07T00:01:28.395] gres/gram: state for smurf02 [2015-05-07T00:01:28.395] gres_cnt found:6000 configured:6000 avail:6000 no_consume [2015-05-07T00:01:28.395] gres_bit_alloc:NULL [2015-05-07T00:01:28.395] gres_used:(null) [2015-05-07T00:01:28.395] gres/scratch: state for smurf02 [2015-05-07T00:01:28.395] gres_cnt found:1300 configured:1300 avail:1300 alloc:0 [2015-05-07T00:01:28.395] gres_bit_alloc:NULL [2015-05-07T00:01:28.395] gres_used:(null) [2015-05-07T00:01:28.395] debug2: Tree head got back 5 [2015-05-07T00:01:28.395] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:28.395] debug2: Tree head got back 6 [2015-05-07T00:01:28.395] gres/gpu: state for smurf01 [2015-05-07T00:01:28.395] gres_cnt found:8 configured:8 avail:8 alloc:0 [2015-05-07T00:01:28.395] gres_bit_alloc: [2015-05-07T00:01:28.395] gres_used:(null) [2015-05-07T00:01:28.395] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:28.395] topo_gres_bitmap[0]:0 [2015-05-07T00:01:28.395] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:28.395] topo_gres_cnt_avail[0]:1 [2015-05-07T00:01:28.395] type[0]:tesla [2015-05-07T00:01:28.395] topo_cpus_bitmap[1]:NULL [2015-05-07T00:01:28.395] topo_gres_bitmap[1]:1 [2015-05-07T00:01:28.395] topo_gres_cnt_alloc[1]:0 [2015-05-07T00:01:28.395] topo_gres_cnt_avail[1]:1 [2015-05-07T00:01:28.395] type[1]:tesla [2015-05-07T00:01:28.395] topo_cpus_bitmap[2]:NULL [2015-05-07T00:01:28.395] topo_gres_bitmap[2]:2 [2015-05-07T00:01:28.395] topo_gres_cnt_alloc[2]:0 [2015-05-07T00:01:28.395] topo_gres_cnt_avail[2]:1 [2015-05-07T00:01:28.395] type[2]:tesla [2015-05-07T00:01:28.395] topo_cpus_bitmap[3]:NULL [2015-05-07T00:01:28.395] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:28.395] topo_gres_bitmap[3]:3 [2015-05-07T00:01:28.395] topo_gres_cnt_alloc[3]:0 [2015-05-07T00:01:28.395] topo_gres_cnt_avail[3]:1 [2015-05-07T00:01:28.395] type[3]:tesla [2015-05-07T00:01:28.395] topo_cpus_bitmap[4]:NULL [2015-05-07T00:01:28.395] topo_gres_bitmap[4]:4 [2015-05-07T00:01:28.395] topo_gres_cnt_alloc[4]:0 [2015-05-07T00:01:28.395] topo_gres_cnt_avail[4]:1 [2015-05-07T00:01:28.395] type[4]:tesla [2015-05-07T00:01:28.396] topo_cpus_bitmap[5]:NULL [2015-05-07T00:01:28.396] topo_gres_bitmap[5]:5 [2015-05-07T00:01:28.396] topo_gres_cnt_alloc[5]:0 [2015-05-07T00:01:28.396] topo_gres_cnt_avail[5]:1 [2015-05-07T00:01:28.396] type[5]:tesla [2015-05-07T00:01:28.396] topo_cpus_bitmap[6]:NULL [2015-05-07T00:01:28.396] topo_gres_bitmap[6]:6 [2015-05-07T00:01:28.396] topo_gres_cnt_alloc[6]:0 [2015-05-07T00:01:28.396] topo_gres_cnt_avail[6]:1 [2015-05-07T00:01:28.396] type[6]:tesla [2015-05-07T00:01:28.396] topo_cpus_bitmap[7]:NULL [2015-05-07T00:01:28.396] topo_gres_bitmap[7]:7 [2015-05-07T00:01:28.396] topo_gres_cnt_alloc[7]:0 [2015-05-07T00:01:28.396] topo_gres_cnt_avail[7]:1 [2015-05-07T00:01:28.396] type[7]:tesla [2015-05-07T00:01:28.396] type_cnt_alloc[0]:0 [2015-05-07T00:01:28.396] type_cnt_avail[0]:8 [2015-05-07T00:01:28.396] type[0]:tesla [2015-05-07T00:01:28.396] gres/ram: state for smurf01 [2015-05-07T00:01:28.396] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:28.396] debug2: _slurm_rpc_node_registration complete for smurf02 usec=314 [2015-05-07T00:01:28.396] gres_cnt found:48 configured:48 avail:48 alloc:0 [2015-05-07T00:01:28.396] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:28.396] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:28.396] gres_bit_alloc:NULL [2015-05-07T00:01:28.396] gres_used:(null) [2015-05-07T00:01:28.396] gres/gram: state for smurf01 [2015-05-07T00:01:28.396] gres_cnt found:6000 configured:6000 avail:6000 no_consume [2015-05-07T00:01:28.396] gres_bit_alloc:NULL [2015-05-07T00:01:28.396] gres_used:(null) [2015-05-07T00:01:28.396] gres/scratch: state for smurf01 [2015-05-07T00:01:28.396] gres_cnt found:1300 configured:1300 avail:1300 alloc:0 [2015-05-07T00:01:28.396] gres_bit_alloc:NULL [2015-05-07T00:01:28.396] gres_used:(null) [2015-05-07T00:01:28.396] debug2: Tree head got back 7 [2015-05-07T00:01:28.396] debug2: _slurm_rpc_node_registration complete for smurf01 usec=1060 [2015-05-07T00:01:28.396] debug2: Tree head got back 7 [2015-05-07T00:01:28.396] gres/gpu: state for smurf04 [2015-05-07T00:01:28.396] gres_cnt found:4 configured:4 avail:4 alloc:0 [2015-05-07T00:01:28.396] gres_bit_alloc: [2015-05-07T00:01:28.396] gres_used:(null) [2015-05-07T00:01:28.396] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:28.396] topo_gres_bitmap[0]:0-3 [2015-05-07T00:01:28.396] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:28.396] topo_gres_cnt_avail[0]:4 [2015-05-07T00:01:28.396] type[0]:gtx [2015-05-07T00:01:28.396] type_cnt_alloc[0]:0 [2015-05-07T00:01:28.396] type_cnt_avail[0]:4 [2015-05-07T00:01:28.396] type[0]:gtx [2015-05-07T00:01:28.396] gres/ram: state for smurf04 [2015-05-07T00:01:28.396] gres_cnt found:94 configured:94 avail:94 alloc:0 [2015-05-07T00:01:28.396] gres_bit_alloc:NULL [2015-05-07T00:01:28.396] gres_used:(null) [2015-05-07T00:01:28.396] gres/gram: state for smurf04 [2015-05-07T00:01:28.396] gres_cnt found:1500 configured:1500 avail:1500 no_consume [2015-05-07T00:01:28.396] gres_bit_alloc:NULL [2015-05-07T00:01:28.396] gres_used:(null) [2015-05-07T00:01:28.396] gres/scratch: state for smurf04 [2015-05-07T00:01:28.396] gres_cnt found:280 configured:280 avail:280 alloc:0 [2015-05-07T00:01:28.396] gres_bit_alloc:NULL [2015-05-07T00:01:28.396] gres_used:(null) [2015-05-07T00:01:28.396] debug2: _slurm_rpc_node_registration complete for smurf04 usec=1102 [2015-05-07T00:01:28.396] gres/gpu: state for smurf06 [2015-05-07T00:01:28.396] gres_cnt found:2 configured:2 avail:2 alloc:0 [2015-05-07T00:01:28.396] gres_bit_alloc: [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:28.397] topo_gres_bitmap[0]:0-1 [2015-05-07T00:01:28.397] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:28.397] topo_gres_cnt_avail[0]:2 [2015-05-07T00:01:28.397] type[0]:gtx [2015-05-07T00:01:28.397] type_cnt_alloc[0]:0 [2015-05-07T00:01:28.397] type_cnt_avail[0]:2 [2015-05-07T00:01:28.397] type[0]:gtx [2015-05-07T00:01:28.397] gres/ram: state for smurf06 [2015-05-07T00:01:28.397] gres_cnt found:8 configured:8 avail:8 alloc:0 [2015-05-07T00:01:28.397] gres_bit_alloc:NULL [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] gres/gram: state for smurf06 [2015-05-07T00:01:28.397] gres_cnt found:1250 configured:1250 avail:1250 no_consume [2015-05-07T00:01:28.397] gres_bit_alloc:NULL [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] gres/scratch: state for smurf06 [2015-05-07T00:01:28.397] gres_cnt found:1800 configured:1800 avail:1800 alloc:0 [2015-05-07T00:01:28.397] gres_bit_alloc:NULL [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] debug2: _slurm_rpc_node_registration complete for smurf06 usec=982 [2015-05-07T00:01:28.397] gres/gpu: state for smurf03 [2015-05-07T00:01:28.397] gres_cnt found:3 configured:3 avail:3 alloc:0 [2015-05-07T00:01:28.397] gres_bit_alloc: [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:28.397] topo_gres_bitmap[0]:0-2 [2015-05-07T00:01:28.397] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:28.397] topo_gres_cnt_avail[0]:3 [2015-05-07T00:01:28.397] type[0]:gtx [2015-05-07T00:01:28.397] type_cnt_alloc[0]:0 [2015-05-07T00:01:28.397] type_cnt_avail[0]:3 [2015-05-07T00:01:28.397] type[0]:gtx [2015-05-07T00:01:28.397] gres/ram: state for smurf03 [2015-05-07T00:01:28.397] gres_cnt found:94 configured:94 avail:94 alloc:0 [2015-05-07T00:01:28.397] gres_bit_alloc:NULL [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] gres/gram: state for smurf03 [2015-05-07T00:01:28.397] gres_cnt found:1500 configured:1500 avail:1500 no_consume [2015-05-07T00:01:28.397] gres_bit_alloc:NULL [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] gres/scratch: state for smurf03 [2015-05-07T00:01:28.397] gres_cnt found:280 configured:280 avail:280 alloc:0 [2015-05-07T00:01:28.397] gres_bit_alloc:NULL [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] debug2: _slurm_rpc_node_registration complete for smurf03 usec=1202 [2015-05-07T00:01:28.397] gres/gpu: state for smurf05 [2015-05-07T00:01:28.397] gres_cnt found:4 configured:4 avail:4 alloc:0 [2015-05-07T00:01:28.397] gres_bit_alloc: [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:28.397] topo_gres_bitmap[0]:0-3 [2015-05-07T00:01:28.397] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:28.397] topo_gres_cnt_avail[0]:4 [2015-05-07T00:01:28.397] type[0]:gtx [2015-05-07T00:01:28.397] type_cnt_alloc[0]:0 [2015-05-07T00:01:28.397] type_cnt_avail[0]:4 [2015-05-07T00:01:28.397] type[0]:gtx [2015-05-07T00:01:28.397] gres/ram: state for smurf05 [2015-05-07T00:01:28.397] gres_cnt found:256 configured:256 avail:256 alloc:0 [2015-05-07T00:01:28.397] gres_bit_alloc:NULL [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] gres/gram: state for smurf05 [2015-05-07T00:01:28.397] gres_cnt found:6000 configured:6000 avail:6000 no_consume [2015-05-07T00:01:28.397] gres_bit_alloc:NULL [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] gres/scratch: state for smurf05 [2015-05-07T00:01:28.397] gres_cnt found:2400 configured:2400 avail:2400 alloc:0 [2015-05-07T00:01:28.397] gres_bit_alloc:NULL [2015-05-07T00:01:28.397] gres_used:(null) [2015-05-07T00:01:28.397] debug2: _slurm_rpc_node_registration complete for smurf05 usec=1494 [2015-05-07T00:01:28.398] debug2: Processing RPC: MESSAGE_NODE_REGISTRATION_STATUS from uid=0 [2015-05-07T00:01:28.398] gres/gpu: state for smurf07 [2015-05-07T00:01:28.398] gres_cnt found:2 configured:2 avail:2 alloc:0 [2015-05-07T00:01:28.398] gres_bit_alloc: [2015-05-07T00:01:28.398] gres_used:(null) [2015-05-07T00:01:28.398] topo_cpus_bitmap[0]:NULL [2015-05-07T00:01:28.398] topo_gres_bitmap[0]:0-1 [2015-05-07T00:01:28.398] topo_gres_cnt_alloc[0]:0 [2015-05-07T00:01:28.398] topo_gres_cnt_avail[0]:2 [2015-05-07T00:01:28.398] type[0]:gtx [2015-05-07T00:01:28.398] type_cnt_alloc[0]:0 [2015-05-07T00:01:28.398] type_cnt_avail[0]:2 [2015-05-07T00:01:28.398] type[0]:gtx [2015-05-07T00:01:28.398] gres/ram: state for smurf07 [2015-05-07T00:01:28.398] gres_cnt found:16 configured:16 avail:16 alloc:0 [2015-05-07T00:01:28.398] gres_bit_alloc:NULL [2015-05-07T00:01:28.398] gres_used:(null) [2015-05-07T00:01:28.398] gres/gram: state for smurf07 [2015-05-07T00:01:28.398] gres_cnt found:1250 configured:1250 avail:1250 no_consume [2015-05-07T00:01:28.398] gres_bit_alloc:NULL [2015-05-07T00:01:28.398] gres_used:(null) [2015-05-07T00:01:28.398] gres/scratch: state for smurf07 [2015-05-07T00:01:28.398] gres_cnt found:54 configured:54 avail:54 alloc:0 [2015-05-07T00:01:28.398] gres_bit_alloc:NULL [2015-05-07T00:01:28.398] gres_used:(null) [2015-05-07T00:01:28.398] debug2: _slurm_rpc_node_registration complete for smurf07 usec=267 [2015-05-07T00:01:28.407] debug2: node_did_resp smurf02 [2015-05-07T00:01:28.407] debug2: node_did_resp smurf01 [2015-05-07T00:01:28.407] debug2: node_did_resp smurf04 [2015-05-07T00:01:28.407] debug2: node_did_resp smurf05 [2015-05-07T00:01:28.407] debug2: node_did_resp smurf06 [2015-05-07T00:01:28.407] debug2: node_did_resp smurf03 [2015-05-07T00:01:28.407] debug2: node_did_resp smurf07 [2015-05-07T00:01:55.214] debug: backfill: beginning [2015-05-07T00:01:55.214] debug: backfill: no jobs to backfill [2015-05-07T00:01:55.612] debug2: Testing job time limits and checkpoints [2015-05-07T00:02:25.851] debug2: Testing job time limits and checkpoints [2015-05-07T00:02:25.851] debug2: Performing purge of old job records [2015-05-07T00:02:25.851] debug: sched: Running job scheduler [2015-05-07T00:02:38.505] debug2: sched: Processing RPC: REQUEST_RESOURCE_ALLOCATION from uid=507 [2015-05-07T00:02:38.505] debug3: JobDesc: user_id=507 job_id=N/A partition=(null) name=test.sh [2015-05-07T00:02:38.506] debug3: cpus=1-4294967294 pn_min_cpus=-1 core_spec=-1 [2015-05-07T00:02:38.506] debug3: -N min-[max]: 1-[4294967294]:65534:65534:65534 [2015-05-07T00:02:38.506] debug3: pn_min_memory_job=-1 pn_min_tmp_disk=-1 [2015-05-07T00:02:38.506] debug3: immediate=0 features=(null) reservation=(null) [2015-05-07T00:02:38.506] debug3: req_nodes=(null) exc_nodes=(null) gres=gpu:1 [2015-05-07T00:02:38.506] debug3: time_limit=-1--1 priority=-1 contiguous=0 shared=-1 [2015-05-07T00:02:38.506] debug3: kill_on_node_fail=-1 script=(null) [2015-05-07T00:02:38.506] debug3: argv="./test/test.sh" [2015-05-07T00:02:38.506] debug3: stdin=(null) stdout=(null) stderr=(null) [2015-05-07T00:02:38.506] debug3: work_dir=/home/weber alloc_node:sid=HOST:20491 [2015-05-07T00:02:38.506] debug3: resp_host=192.168.1.1 alloc_resp_port=56818 other_port=58417 [2015-05-07T00:02:38.506] debug3: dependency=(null) account=(null) qos=(null) comment=(null) [2015-05-07T00:02:38.506] debug3: mail_type=0 mail_user=(null) nice=0 num_tasks=1 open_mode=0 overcommit=-1 acctg_freq=(null) [2015-05-07T00:02:38.506] debug3: network=(null) begin=Unknown cpus_per_task=-1 requeue=-1 licenses=(null) [2015-05-07T00:02:38.506] debug3: end_time=Unknown signal=0@0 wait_all_nodes=-1 [2015-05-07T00:02:38.506] debug3: ntasks_per_node=-1 ntasks_per_socket=-1 ntasks_per_core=-1 [2015-05-07T00:02:38.506] debug3: mem_bind=65534:(null) plane_size:65534 [2015-05-07T00:02:38.506] debug3: array_inx=(null) [2015-05-07T00:02:38.506] debug3: User (null)(507) doesn't have a default account [2015-05-07T00:02:38.506] debug3: User (null)(507) doesn't have a default account [2015-05-07T00:02:38.506] debug3: found correct qos [2015-05-07T00:02:38.506] debug3: before alteration asking for nodes 1-4294967294 cpus 1-4294967294 [2015-05-07T00:02:38.506] debug3: after alteration asking for nodes 1-4294967294 cpus 1-4294967294 [2015-05-07T00:02:38.506] gres: gpu state for job 157 [2015-05-07T00:02:38.506] gres_cnt:1 node_cnt:0 type:(null) [2015-05-07T00:02:38.506] debug2: initial priority for job 157 is 1 [2015-05-07T00:02:38.506] debug2: found 1 usable nodes from config containing smurf01 [2015-05-07T00:02:38.506] debug2: found 1 usable nodes from config containing smurf02 [2015-05-07T00:02:38.506] debug2: found 1 usable nodes from config containing smurf03 [2015-05-07T00:02:38.506] debug2: found 1 usable nodes from config containing smurf04 [2015-05-07T00:02:38.506] debug2: found 1 usable nodes from config containing smurf05 [2015-05-07T00:02:38.506] debug2: found 1 usable nodes from config containing smurf06 [2015-05-07T00:02:38.506] debug2: found 1 usable nodes from config containing smurf07 [2015-05-07T00:02:38.506] debug3: _pick_best_nodes: job 157 idle_nodes 7 share_nodes 7 [2015-05-07T00:02:38.506] _pick_best_nodes: job 157 never runnable [2015-05-07T00:02:38.506] debug2: Spawning RPC agent for msg_type SRUN_JOB_COMPLETE [2015-05-07T00:02:38.506] debug3: User (null)(507) doesn't have a default account [2015-05-07T00:02:38.506] _slurm_rpc_allocate_resources: Requested node configuration is not available [2015-05-07T00:02:38.506] debug2: got 1 threads to send out [2015-05-07T00:02:38.507] debug3: slurm_send_only_node_msg: sent 0 [2015-05-07T00:02:55.022] debug2: Testing job time limits and checkpoints [2015-05-07T00:02:55.214] debug: backfill: beginning [2015-05-07T00:02:55.214] debug: backfill: no jobs to backfill [2015-05-07T00:03:25.171] debug2: Testing job time limits and checkpoints [2015-05-07T00:03:25.171] debug2: Performing purge of old job records [2015-05-07T00:03:25.171] debug: sched: Running job scheduler [2015-05-07T00:03:26.174] debug4: No backup slurmctld to ping [2015-05-07T00:03:55.361] debug2: Testing job time limits and checkpoints [2015-05-07T00:04:25.571] debug2: Testing job time limits and checkpoints [2015-05-07T00:04:25.571] debug2: Performing purge of old job records [2015-05-07T00:04:25.571] debug: sched: Running job scheduler [2015-05-07T00:04:48.757] debug: Spawning ping agent for smurf[01-07] [2015-05-07T00:04:48.757] debug2: Spawning RPC agent for msg_type REQUEST_PING [2015-05-07T00:04:48.757] debug2: got 1 threads to send out [2015-05-07T00:04:48.757] debug3: Tree sending to smurf01 [2015-05-07T00:04:48.757] debug3: Tree sending to smurf02 [2015-05-07T00:04:48.757] debug2: Tree head got back 0 looking for 7 [2015-05-07T00:04:48.757] debug3: Tree sending to smurf05 [2015-05-07T00:04:48.758] debug3: Tree sending to smurf07 [2015-05-07T00:04:48.758] debug3: Tree sending to smurf03 [2015-05-07T00:04:48.758] debug3: Tree sending to smurf04 [2015-05-07T00:04:48.758] debug3: Tree sending to smurf06 [2015-05-07T00:04:48.758] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:04:48.758] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:04:48.758] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:04:48.758] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:04:48.758] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:04:48.758] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:04:48.759] debug4: orig_timeout was 10000 we have 0 steps and a timeout of 10000 [2015-05-07T00:04:48.760] debug2: Tree head got back 1 [2015-05-07T00:04:48.760] debug2: Tree head got back 2 [2015-05-07T00:04:48.760] debug2: Tree head got back 3 [2015-05-07T00:04:48.761] debug2: Tree head got back 4 [2015-05-07T00:04:48.761] debug2: Tree head got back 5 [2015-05-07T00:04:48.761] debug2: Tree head got back 6 [2015-05-07T00:04:48.761] debug2: Tree head got back 7 [2015-05-07T00:04:48.762] debug2: node_did_resp smurf01 [2015-05-07T00:04:48.762] debug2: node_did_resp smurf02 [2015-05-07T00:04:48.762] debug2: node_did_resp smurf03 [2015-05-07T00:04:48.762] debug2: node_did_resp smurf06 [2015-05-07T00:04:48.762] debug2: node_did_resp smurf05 [2015-05-07T00:04:48.762] debug2: node_did_resp smurf07 [2015-05-07T00:04:48.762] debug2: node_did_resp smurf04 [2015-05-07T00:04:55.215] debug: backfill: beginning [2015-05-07T00:04:55.216] debug: backfill: no jobs to backfill [2015-05-07T00:04:55.796] debug2: Testing job time limits and checkpoints [2015-05-07T00:05:25.981] debug2: Testing job time limits and checkpoints [2015-05-07T00:05:25.981] debug2: Performing purge of old job records [2015-05-07T00:05:25.981] debug2: purge_old_job: purged 1 old job records [2015-05-07T00:05:25.981] debug: sched: Running job scheduler [2015-05-07T00:05:27.999] debug4: No backup slurmctld to ping [2015-05-07T00:05:55.148] debug2: Testing job time limits and checkpoints
[2015-05-07T00:22:14.569] debug3: Trying to load plugin /usr/lib64/slurm/gres_gpu.so [2015-05-07T00:22:14.569] debug: init: Gres GPU plugin loaded [2015-05-07T00:22:14.569] debug3: Success. [2015-05-07T00:22:14.569] debug3: Trying to load plugin /usr/lib64/slurm/gres_ram.so [2015-05-07T00:22:14.569] debug4: /usr/lib64/slurm/gres_ram.so: Does not exist or not a regular file. [2015-05-07T00:22:14.569] debug: gres: Couldn't find the specified plugin name for gres/ram looking at all files [2015-05-07T00:22:14.569] debug: Cannot find plugin of type gres/ram, just track gres counts [2015-05-07T00:22:14.569] debug3: Trying to load plugin /usr/lib64/slurm/gres_gram.so [2015-05-07T00:22:14.569] debug4: /usr/lib64/slurm/gres_gram.so: Does not exist or not a regular file. [2015-05-07T00:22:14.569] debug: gres: Couldn't find the specified plugin name for gres/gram looking at all files [2015-05-07T00:22:14.570] debug: Cannot find plugin of type gres/gram, just track gres counts [2015-05-07T00:22:14.570] debug3: Trying to load plugin /usr/lib64/slurm/gres_scratch.so [2015-05-07T00:22:14.570] debug4: /usr/lib64/slurm/gres_scratch.so: Does not exist or not a regular file. [2015-05-07T00:22:14.570] debug: gres: Couldn't find the specified plugin name for gres/scratch looking at all files [2015-05-07T00:22:14.570] debug: Cannot find plugin of type gres/scratch, just track gres counts [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf02 Name=gpu Type=tesla File=/dev/nvidia[0-7] Count=8 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf02 Name=ram Count=48 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf02 Name=gram Count=6000 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf02 Name=scratch Count=1300 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf03 Name=gpu Type=gtx File=/dev/nvidia[0-2] Count=3 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf03 Name=ram Count=94 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf03 Name=gram Count=1500 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf03 Name=scratch Count=280 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf04 Name=gpu Type=gtx File=/dev/nvidia[0-3] Count=4 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf04 Name=ram Count=94 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf04 Name=gram Count=1500 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf04 Name=scratch Count=280 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf05 Name=gpu Type=gtx File=/dev/nvidia[0-3] Count=4 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf05 Name=ram Count=256 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf05 Name=gram Count=6000 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf05 Name=scratch Count=2400 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf06 Name=gpu Type=gtx File=/dev/nvidia[0-1] Count=2 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf06 Name=ram Count=8 [2015-05-07T00:22:14.571] debug: skipping GRES for NodeName=smurf06 Name=gram Count=1250 [2015-05-07T00:22:14.572] debug: skipping GRES for NodeName=smurf06 Name=scratch Count=1800 [2015-05-07T00:22:14.572] debug: skipping GRES for NodeName=smurf07 Name=gpu Type=gtx File=/dev/nvidia[0-1] Count=2 [2015-05-07T00:22:14.572] debug: skipping GRES for NodeName=smurf07 Name=ram Count=16 [2015-05-07T00:22:14.572] debug: skipping GRES for NodeName=smurf07 Name=gram Count=1250 [2015-05-07T00:22:14.572] debug: skipping GRES for NodeName=smurf07 Name=scratch Count=54 [2015-05-07T00:22:14.572] Gres Name=gpu Type=tesla Count=1 ID=7696487 File=/dev/nvidia0 [2015-05-07T00:22:14.572] Gres Name=gpu Type=tesla Count=1 ID=7696487 File=/dev/nvidia1 [2015-05-07T00:22:14.572] Gres Name=gpu Type=tesla Count=1 ID=7696487 File=/dev/nvidia2 [2015-05-07T00:22:14.572] Gres Name=gpu Type=tesla Count=1 ID=7696487 File=/dev/nvidia3 [2015-05-07T00:22:14.572] Gres Name=gpu Type=tesla Count=1 ID=7696487 File=/dev/nvidia4 [2015-05-07T00:22:14.572] Gres Name=gpu Type=tesla Count=1 ID=7696487 File=/dev/nvidia5 [2015-05-07T00:22:14.572] Gres Name=gpu Type=tesla Count=1 ID=7696487 File=/dev/nvidia6 [2015-05-07T00:22:14.572] Gres Name=gpu Type=tesla Count=1 ID=7696487 File=/dev/nvidia7 [2015-05-07T00:22:14.572] Gres Name=ram Type=(null) Count=48 ID=7168370 [2015-05-07T00:22:14.572] Gres Name=gram Type=(null) Count=6000 ID=1835102823 [2015-05-07T00:22:14.572] Gres Name=scratch Type=(null) Count=1300 ID=1641727719 [2015-05-07T00:22:14.572] gpu 0 is device number 0 [2015-05-07T00:22:14.572] gpu 1 is device number 1 [2015-05-07T00:22:14.572] gpu 2 is device number 2 [2015-05-07T00:22:14.572] gpu 3 is device number 3 [2015-05-07T00:22:14.572] gpu 4 is device number 4 [2015-05-07T00:22:14.572] gpu 5 is device number 5 [2015-05-07T00:22:14.572] gpu 6 is device number 6 [2015-05-07T00:22:14.572] gpu 7 is device number 7 [2015-05-07T00:22:14.572] debug3: Trying to load plugin /usr/lib64/slurm/topology_none.so [2015-05-07T00:22:14.572] topology NONE plugin loaded [2015-05-07T00:22:14.572] debug3: Success. [2015-05-07T00:22:14.572] debug3: Trying to load plugin /usr/lib64/slurm/route_default.so [2015-05-07T00:22:14.572] route default plugin loaded [2015-05-07T00:22:14.572] debug3: Success. [2015-05-07T00:22:14.572] debug2: Gathering cpu frequency information for 24 cpus [2015-05-07T00:22:14.573] debug: cpu_freq_init: CPU:0 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.573] debug: cpu_freq_init: CPU:1 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.573] debug: cpu_freq_init: CPU:2 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.573] debug: cpu_freq_init: CPU:3 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.573] debug: cpu_freq_init: CPU:4 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.573] debug: cpu_freq_init: CPU:5 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.573] debug: cpu_freq_init: CPU:6 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.573] debug: cpu_freq_init: CPU:7 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.573] debug: cpu_freq_init: CPU:8 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.573] debug: cpu_freq_init: CPU:9 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:10 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:11 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:12 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:13 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:14 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:15 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:16 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:17 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:18 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:19 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.574] debug: cpu_freq_init: CPU:20 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.575] debug: cpu_freq_init: CPU:21 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.575] debug: cpu_freq_init: CPU:22 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.575] debug: cpu_freq_init: CPU:23 reset_freq:1600000 avail_gov:1f orig_governor:conservative [2015-05-07T00:22:14.575] No specialized cores configured by default on this node [2015-05-07T00:22:14.575] Resource spec: system memory limit not configured for this node [2015-05-07T00:22:14.575] debug3: NodeName = smurf01 [2015-05-07T00:22:14.575] debug3: TopoAddr = smurf01 [2015-05-07T00:22:14.575] debug3: TopoPattern = node [2015-05-07T00:22:14.575] debug3: CacheGroups = 0 [2015-05-07T00:22:14.575] debug3: ClusterName = egpc [2015-05-07T00:22:14.575] debug3: Confile = `/etc/slurm/slurm.conf' [2015-05-07T00:22:14.575] debug3: Debug = 9 [2015-05-07T00:22:14.575] debug3: CPUs = 24 (CF: 24, HW: 24) [2015-05-07T00:22:14.575] debug3: Boards = 1 (CF: 1, HW: 1) [2015-05-07T00:22:14.575] debug3: Sockets = 2 (CF: 2, HW: 2) [2015-05-07T00:22:14.575] debug3: Cores = 6 (CF: 6, HW: 6) [2015-05-07T00:22:14.575] debug3: Threads = 2 (CF: 2, HW: 2) [2015-05-07T00:22:14.575] debug3: UpTime = 43706 = 12:08:26 [2015-05-07T00:22:14.575] debug3: Block Map = 0,12,8,20,4,16,2,14,10,22,6,18,1,13,9,21,5,17,3,15,11,23,7,19 [2015-05-07T00:22:14.575] debug3: Inverse Map = 0,12,6,18,4,16,10,22,2,14,8,20,1,13,7,19,5,17,11,23,3,15,9,21 [2015-05-07T00:22:14.575] debug3: RealMemory = 48128 [2015-05-07T00:22:14.575] debug3: TmpDisk = 51175 [2015-05-07T00:22:14.575] debug3: Epilog = `/apps/.slurm/job_epilog.sh' [2015-05-07T00:22:14.575] debug3: Logfile = `/var/log/slurmd' [2015-05-07T00:22:14.575] debug3: HealthCheck = `(null)' [2015-05-07T00:22:14.575] debug3: NodeName = smurf01 [2015-05-07T00:22:14.575] debug3: NodeAddr = 192.168.1.101 [2015-05-07T00:22:14.575] debug3: Port = 6818 [2015-05-07T00:22:14.575] debug3: Prolog = `/apps/.slurm/job_prolog.sh' [2015-05-07T00:22:14.575] debug3: TmpFS = `/tmp' [2015-05-07T00:22:14.575] debug3: Public Cert = `(null)' [2015-05-07T00:22:14.575] debug3: ChosLoc = `(null)' [2015-05-07T00:22:14.575] debug3: Slurmstepd = `/usr/sbin/slurmstepd' [2015-05-07T00:22:14.575] debug3: Spool Dir = `/var/spool/slurmd' [2015-05-07T00:22:14.575] debug3: Pid File = `/var/run/slurmd.pid' [2015-05-07T00:22:14.575] debug3: Slurm UID = 888 [2015-05-07T00:22:14.575] debug3: TaskProlog = `/apps/.slurm/task_prolog.sh' [2015-05-07T00:22:14.575] debug3: TaskEpilog = `/apps/.slurm/task_epilog.sh' [2015-05-07T00:22:14.575] debug3: TaskPluginParam = 0 [2015-05-07T00:22:14.575] debug3: Use PAM = 0 [2015-05-07T00:22:14.575] debug3: Trying to load plugin /usr/lib64/slurm/proctrack_cgroup.so [2015-05-07T00:22:14.575] debug: Reading cgroup.conf file /etc/slurm/cgroup.conf [2015-05-07T00:22:14.576] debug3: Success. [2015-05-07T00:22:14.576] debug3: Trying to load plugin /usr/lib64/slurm/task_cgroup.so [2015-05-07T00:22:14.576] debug: Reading cgroup.conf file /etc/slurm/cgroup.conf [2015-05-07T00:22:14.576] debug: task/cgroup: now constraining jobs allocated cores [2015-05-07T00:22:14.576] debug: task/cgroup: loaded [2015-05-07T00:22:14.576] debug3: Success. [2015-05-07T00:22:14.576] debug3: Trying to load plugin /usr/lib64/slurm/auth_munge.so [2015-05-07T00:22:14.577] debug: auth plugin for Munge (http://code.google.com/p/munge/) loaded [2015-05-07T00:22:14.577] debug3: Success. [2015-05-07T00:22:14.577] debug: spank: opening plugin stack /etc/slurm/plugstack.conf [2015-05-07T00:22:14.577] debug3: Trying to load plugin /usr/lib64/slurm/crypto_munge.so [2015-05-07T00:22:14.577] Munge cryptographic signature plugin loaded [2015-05-07T00:22:14.577] debug3: Success. [2015-05-07T00:22:14.577] debug3: initializing slurmd spool directory [2015-05-07T00:22:14.577] debug3: slurmd initialization successful [2015-05-07T00:22:14.578] Warning: Core limit is only 0 KB [2015-05-07T00:22:14.578] slurmd version 14.11.5 started [2015-05-07T00:22:14.578] debug3: finished daemonize [2015-05-07T00:22:14.578] debug3: cred_unpack: job 4 ctime:150423111255 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 5 ctime:150423111403 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 6 ctime:150423111410 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 7 ctime:150423112216 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 8 ctime:150423112256 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 10 ctime:150423114801 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 11 ctime:150423115333 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 12 ctime:150423115724 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 20 ctime:150423120301 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 21 ctime:150423120417 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 25 ctime:150423120730 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 37 ctime:150423134949 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 38 ctime:150423134956 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 44 ctime:150423154343 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 45 ctime:150423154506 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 46 ctime:150423154528 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 50 ctime:150423155321 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 51 ctime:150423155405 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 52 ctime:150423155420 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 53 ctime:150423155942 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 54 ctime:150423155948 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 57 ctime:150423160717 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 58 ctime:150423160752 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 59 ctime:150423160758 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 60 ctime:150423160805 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 61 ctime:150423160808 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 62 ctime:150423160817 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 63 ctime:150423161413 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 64 ctime:150423163051 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 66 ctime:150423165808 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 67 ctime:150423170426 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 68 ctime:150423171832 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 70 ctime:150424103146 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 72 ctime:150424105627 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 73 ctime:150424105655 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 74 ctime:150424105703 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 76 ctime:150424110148 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 77 ctime:150424111327 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 87 ctime:150506120033 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 89 ctime:150506122813 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 90 ctime:150506122900 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 91 ctime:150506122909 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 93 ctime:150506122943 expires:700101010000 [2015-05-07T00:22:14.578] debug3: cred_unpack: job 99 ctime:150506124130 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 100 ctime:150506124255 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 110 ctime:150506130254 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 125 ctime:150506200043 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 126 ctime:150506200055 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 127 ctime:150506200225 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 128 ctime:150506200337 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 129 ctime:150506200449 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 130 ctime:150506200529 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 131 ctime:150506200754 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 132 ctime:150506200929 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 133 ctime:150506201334 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 134 ctime:150506201618 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 135 ctime:150506201635 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 136 ctime:150506201724 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 137 ctime:150506201809 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 138 ctime:150506201821 expires:700101010000 [2015-05-07T00:22:14.579] debug3: cred_unpack: job 139 ctime:150506202208 expires:700101010000 [2015-05-07T00:22:14.579] debug3: Trying to load plugin /usr/lib64/slurm/jobacct_gather_linux.so [2015-05-07T00:22:14.579] debug: Job accounting gather LINUX plugin loaded [2015-05-07T00:22:14.579] debug3: Success. [2015-05-07T00:22:14.579] debug3: Trying to load plugin /usr/lib64/slurm/job_container_none.so [2015-05-07T00:22:14.579] debug: job_container none plugin loaded [2015-05-07T00:22:14.579] debug3: Success. [2015-05-07T00:22:14.579] debug3: Trying to load plugin /usr/lib64/slurm/core_spec_none.so [2015-05-07T00:22:14.579] debug3: Success. [2015-05-07T00:22:14.579] debug3: Trying to load plugin /usr/lib64/slurm/switch_none.so [2015-05-07T00:22:14.579] debug: switch NONE plugin loaded [2015-05-07T00:22:14.579] debug3: Success. [2015-05-07T00:22:14.581] debug3: successfully opened slurm listen port 192.168.1.101:6818 [2015-05-07T00:22:14.581] slurmd started on Thu, 07 May 2015 00:22:14 +0200 [2015-05-07T00:22:14.583] CPUs=24 Boards=1 Sockets=2 Cores=6 Threads=2 Memory=48128 TmpDisk=51175 Uptime=43706 CPUSpecList=(null) [2015-05-07T00:22:14.583] debug3: Trying to load plugin /usr/lib64/slurm/acct_gather_energy_none.so [2015-05-07T00:22:14.584] debug: AcctGatherEnergy NONE plugin loaded [2015-05-07T00:22:14.584] debug3: Success. [2015-05-07T00:22:14.584] debug3: Trying to load plugin /usr/lib64/slurm/acct_gather_profile_none.so [2015-05-07T00:22:14.584] debug: AcctGatherProfile NONE plugin loaded [2015-05-07T00:22:14.584] debug3: Success. [2015-05-07T00:22:14.584] debug3: Trying to load plugin /usr/lib64/slurm/acct_gather_infiniband_none.so [2015-05-07T00:22:14.584] debug: AcctGatherInfiniband NONE plugin loaded [2015-05-07T00:22:14.584] debug3: Success. [2015-05-07T00:22:14.584] debug3: Trying to load plugin /usr/lib64/slurm/acct_gather_filesystem_none.so [2015-05-07T00:22:14.584] debug: AcctGatherFilesystem NONE plugin loaded [2015-05-07T00:22:14.584] debug3: Success. [2015-05-07T00:22:14.584] debug2: No acct_gather.conf file (/etc/slurm/acct_gather.conf)
