Hi Alden, The CPU time is probably summarization of that from all 8 CPU cores. By any chance do you have any runaway process from the job on the node, such as epilogue etc? I am guessing...
On Thu, Jun 8, 2017 at 2:39 PM Stradling, Alden Reid (ars9ac) < [email protected]> wrote: > I have a job whose workload finished yesterday (successfully, no issues, > output files good), but the SLURM job is still accumulating time. I just > suspended it, but I’d like to know how it’s getting away with billing many > extra hours. > > The other 26 in this batch completed normally. The job script completed > on June 7th at 7:53:19. > > JobName State Start Elapsed CPUTime > ---------- ---------- ------------------- ---------- ---------- > PBMC_5c_0+ COMPLETED 2017-06-06T16:39:51 02:53:56 23:11:28 > batch COMPLETED 2017-06-06T16:39:51 02:53:56 23:11:28 > PBMC_6a_0+ COMPLETED 2017-06-06T16:39:51 04:54:06 1-15:12:48 > batch COMPLETED 2017-06-06T16:39:51 04:54:06 1-15:12:48 > PBMC_6b_0+ COMPLETED 2017-06-06T16:39:51 03:04:41 1-00:37:28 > batch COMPLETED 2017-06-06T16:39:51 03:04:41 1-00:37:28 > PBMC_6c_0+ SUSPENDED 2017-06-06T16:39:51 1-21:12:55 *15-01:43:20* > > That 15 days… not really possible since the job started two days ago. > > sstat has nothing to say. scontrol shows me nothing out of the ordinary: > > [root@udc-ba34-37:~] scontrol show jobid -dd 665155 > JobId=665155 JobName=PBMC_6c_020917_ATACseq.py > JobState=SUSPENDED Reason=None Dependency=(null) > Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 > DerivedExitCode=0:0 > RunTime=1-21:12:39 TimeLimit=3-00:00:00 TimeMin=N/A > SubmitTime=2017-06-06T16:39:49 EligibleTime=2017-06-06T16:39:49 > StartTime=2017-06-06T16:39:51 EndTime=2017-06-09T16:39:51 > PreemptTime=None SuspendTime=2017-06-08T13:52:30 SecsPreSuspend=162759 > Partition=serial AllocNode:Sid=udc-ba34-37:199211 > ReqNodeList=(null) ExcNodeList=(null) > NodeList=udc-ba33-28c > BatchHost=udc-ba33-28c > NumNodes=1 NumCPUs=8 CPUs/Task=8 ReqB:S:C:T=0:0:*:* > Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=* > Nodes=udc-ba33-28c CPU_IDs=2-9 Mem=32000 > MinCPUsNode=8 MinMemoryNode=32000M MinTmpDiskNode=0 > Features=(null) Gres=(null) Reservation=(null) > Shared=OK Contiguous=0 Licenses=(null) Network=(null) > > > Command=/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.sub > > WorkDir=/sfs/lustre/allocations/shefflab/processed/cphg_atac/results_pipeline > > > StdErr=/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.log > StdIn=/dev/null > > > StdOut=/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.log > BatchScript= > #!/bin/bash > #SBATCH --job-name='PBMC_6c_020917_ATACseq.py' > #SBATCH > --output='/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.log' > #SBATCH --mem='32000' > #SBATCH --cpus-per-task='8' > #SBATCH --time='3-00:00:00' > #SBATCH --partition='serial' > #SBATCH -m block > #SBATCH --ntasks=1 > > echo 'Compute node:' `hostname` > echo 'Start time:' `date +'%Y-%m-%d %T'` > > /home/ns5bc/code/ATACseq/pipelines/ATACseq.py --input2 > /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L001_R2_001.fastq.gz > /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L002_R2_001.fastq.gz > > /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L003_R2_001.fastq.gz > > /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L004_R2_001.fastq.gz > --genome hg38 --single-or-paired paired --sample-name PBMC_6c_020917 > --input > /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L001_R1_001.fastq.gz > > /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L002_R1_001.fastq.gz > > /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L003_R1_001.fastq.gz > > /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L004_R1_001.fastq.gz > --prealignments rCRSd --genome-size hs -D --frip-ref-peaks > /home/ns5bc/code/cphg_atac/metadata/CD4_hotSpot_liftedhg19tohg38.bed > -O /sfs/lustre/allocations/shefflab/processed/cphg_atac/results_pipeline -P > 8 -M 32000 > > Thanks! > > ——————— > Alden Stradling > Research Computing Infrastructure > University of Virginia > [email protected] > >
