I have a job whose workload finished yesterday (successfully, no issues, output 
files good), but the SLURM job is still accumulating time. I just suspended it, 
but I’d like to know how it’s getting away with billing many extra hours.

 The other 26 in this batch completed normally. The job script completed on 
June 7th at 7:53:19. 

   JobName      State               Start    Elapsed    CPUTime 
---------- ---------- ------------------- ---------- ----------
PBMC_5c_0+  COMPLETED 2017-06-06T16:39:51   02:53:56   23:11:28 
     batch  COMPLETED 2017-06-06T16:39:51   02:53:56   23:11:28 
PBMC_6a_0+  COMPLETED 2017-06-06T16:39:51   04:54:06 1-15:12:48 
     batch  COMPLETED 2017-06-06T16:39:51   04:54:06 1-15:12:48 
PBMC_6b_0+  COMPLETED 2017-06-06T16:39:51   03:04:41 1-00:37:28 
     batch  COMPLETED 2017-06-06T16:39:51   03:04:41 1-00:37:28 
PBMC_6c_0+  SUSPENDED 2017-06-06T16:39:51 1-21:12:55 15-01:43:20 

That 15 days… not really possible since the job started two days ago.

sstat has nothing to say. scontrol shows me nothing out of the ordinary:

[root@udc-ba34-37:~] scontrol show jobid -dd  665155
JobId=665155 JobName=PBMC_6c_020917_ATACseq.py
   JobState=SUSPENDED Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   DerivedExitCode=0:0
   RunTime=1-21:12:39 TimeLimit=3-00:00:00 TimeMin=N/A
   SubmitTime=2017-06-06T16:39:49 EligibleTime=2017-06-06T16:39:49
   StartTime=2017-06-06T16:39:51 EndTime=2017-06-09T16:39:51
   PreemptTime=None SuspendTime=2017-06-08T13:52:30 SecsPreSuspend=162759
   Partition=serial AllocNode:Sid=udc-ba34-37:199211
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=udc-ba33-28c
   BatchHost=udc-ba33-28c
   NumNodes=1 NumCPUs=8 CPUs/Task=8 ReqB:S:C:T=0:0:*:*
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
     Nodes=udc-ba33-28c CPU_IDs=2-9 Mem=32000
   MinCPUsNode=8 MinMemoryNode=32000M MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   
Command=/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.sub
   WorkDir=/sfs/lustre/allocations/shefflab/processed/cphg_atac/results_pipeline
   
StdErr=/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.log
   StdIn=/dev/null
   
StdOut=/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.log
   BatchScript=
#!/bin/bash
#SBATCH --job-name='PBMC_6c_020917_ATACseq.py'
#SBATCH 
--output='/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.log'
#SBATCH --mem='32000'
#SBATCH --cpus-per-task='8'
#SBATCH --time='3-00:00:00'
#SBATCH --partition='serial'
#SBATCH -m block
#SBATCH --ntasks=1

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

/home/ns5bc/code/ATACseq/pipelines/ATACseq.py --input2 
/sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L001_R2_001.fastq.gz
 
/sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L002_R2_001.fastq.gz
 
/sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L003_R2_001.fastq.gz
 
/sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L004_R2_001.fastq.gz
 --genome hg38 --single-or-paired paired --sample-name PBMC_6c_020917 --input 
/sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L001_R1_001.fastq.gz
 
/sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L002_R1_001.fastq.gz
 
/sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L003_R1_001.fastq.gz
 
/sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L004_R1_001.fastq.gz
 --prealignments rCRSd --genome-size hs -D --frip-ref-peaks 
/home/ns5bc/code/cphg_atac/metadata/CD4_hotSpot_liftedhg19tohg38.bed -O 
/sfs/lustre/allocations/shefflab/processed/cphg_atac/results_pipeline -P 8 -M 
32000 

Thanks!

——————— 
Alden Stradling
Research Computing Infrastructure
University of Virginia
[email protected] <mailto:[email protected]>

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to