Hi Alden,

The CPU time is probably summarization of that from all 8 CPU cores. By any
chance do you have any runaway process from the job on the node, such as
epilogue etc? I am guessing...


On Thu, Jun 8, 2017 at 2:39 PM Stradling, Alden Reid (ars9ac) <
[email protected]> wrote:

> I have a job whose workload finished yesterday (successfully, no issues,
> output files good), but the SLURM job is still accumulating time. I just
> suspended it, but I’d like to know how it’s getting away with billing many
> extra hours.
>
>  The other 26 in this batch completed normally. The job script completed
> on June 7th at 7:53:19.
>
>    JobName      State               Start    Elapsed    CPUTime
> ---------- ---------- ------------------- ---------- ----------
> PBMC_5c_0+  COMPLETED 2017-06-06T16:39:51   02:53:56   23:11:28
>      batch  COMPLETED 2017-06-06T16:39:51   02:53:56   23:11:28
> PBMC_6a_0+  COMPLETED 2017-06-06T16:39:51   04:54:06 1-15:12:48
>      batch  COMPLETED 2017-06-06T16:39:51   04:54:06 1-15:12:48
> PBMC_6b_0+  COMPLETED 2017-06-06T16:39:51   03:04:41 1-00:37:28
>      batch  COMPLETED 2017-06-06T16:39:51   03:04:41 1-00:37:28
> PBMC_6c_0+  SUSPENDED 2017-06-06T16:39:51 1-21:12:55 *15-01:43:20*
>
> That 15 days… not really possible since the job started two days ago.
>
> sstat has nothing to say. scontrol shows me nothing out of the ordinary:
>
> [root@udc-ba34-37:~] scontrol show jobid -dd  665155
> JobId=665155 JobName=PBMC_6c_020917_ATACseq.py
>    JobState=SUSPENDED Reason=None Dependency=(null)
>    Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
>    DerivedExitCode=0:0
>    RunTime=1-21:12:39 TimeLimit=3-00:00:00 TimeMin=N/A
>    SubmitTime=2017-06-06T16:39:49 EligibleTime=2017-06-06T16:39:49
>    StartTime=2017-06-06T16:39:51 EndTime=2017-06-09T16:39:51
>    PreemptTime=None SuspendTime=2017-06-08T13:52:30 SecsPreSuspend=162759
>    Partition=serial AllocNode:Sid=udc-ba34-37:199211
>    ReqNodeList=(null) ExcNodeList=(null)
>    NodeList=udc-ba33-28c
>    BatchHost=udc-ba33-28c
>    NumNodes=1 NumCPUs=8 CPUs/Task=8 ReqB:S:C:T=0:0:*:*
>    Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
>      Nodes=udc-ba33-28c CPU_IDs=2-9 Mem=32000
>    MinCPUsNode=8 MinMemoryNode=32000M MinTmpDiskNode=0
>    Features=(null) Gres=(null) Reservation=(null)
>    Shared=OK Contiguous=0 Licenses=(null) Network=(null)
>
>  
> Command=/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.sub
>
>  WorkDir=/sfs/lustre/allocations/shefflab/processed/cphg_atac/results_pipeline
>
>  
> StdErr=/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.log
>    StdIn=/dev/null
>
>  
> StdOut=/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.log
>    BatchScript=
> #!/bin/bash
> #SBATCH --job-name='PBMC_6c_020917_ATACseq.py'
> #SBATCH
> --output='/sfs/lustre/allocations/shefflab/processed/cphg_atac/submission/PBMC_6c_020917_ATACseq.log'
> #SBATCH --mem='32000'
> #SBATCH --cpus-per-task='8'
> #SBATCH --time='3-00:00:00'
> #SBATCH --partition='serial'
> #SBATCH -m block
> #SBATCH --ntasks=1
>
> echo 'Compute node:' `hostname`
> echo 'Start time:' `date +'%Y-%m-%d %T'`
>
> /home/ns5bc/code/ATACseq/pipelines/ATACseq.py --input2
> /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L001_R2_001.fastq.gz
> /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L002_R2_001.fastq.gz
>  
> /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L003_R2_001.fastq.gz
>  
> /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L004_R2_001.fastq.gz
> --genome hg38 --single-or-paired paired --sample-name PBMC_6c_020917
> --input
> /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L001_R1_001.fastq.gz
>  
> /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L002_R1_001.fastq.gz
>  
> /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L003_R1_001.fastq.gz
>  
> /sfs/lustre/allocations/shefflab/data/gsl/PBMC-6c-020917_S1_L004_R1_001.fastq.gz
> --prealignments rCRSd --genome-size hs -D --frip-ref-peaks
> /home/ns5bc/code/cphg_atac/metadata/CD4_hotSpot_liftedhg19tohg38.bed
> -O /sfs/lustre/allocations/shefflab/processed/cphg_atac/results_pipeline -P
> 8 -M 32000
>
> Thanks!
>
> ———————
> Alden Stradling
> Research Computing Infrastructure
> University of Virginia
> [email protected]
>
>

Reply via email to