Hello Steffen,
Steffen Grunewald
<[email protected]> writes:
> Hello all,
>
> I've got a rather newly setup cluster, which at the moment is completely idle
> ("squeue" doesn't return anything.)
>
> From the testing phases, a couple of now unused accounts and associations are
> left, which I'd like to get rid of:
>
> [root@login ~]# sacctmgr show assoc
> Cluster Account User Partition Share GrpJobs GrpTRES
> GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode
> MaxSubmit MaxWall MaxTRESMins QOS Def QOS
> GrpTRESRunMin
> ---------- ---------- ---------- ---------- --------- ------- -------------
> --------- ----------- ------------- ------- ------------- --------------
> --------- ----------- ------------- -------------------- ---------
> -------------
> [...]
> cluster default 1
>
> normal
> cluster default tom 1
>
> normal
> [...]
> [root@login ~]# sacctmgr delete user name=tom account=default
> Error with request: Job(s) active, cancel job(s) before remove
> JobID = 15498 C = cluster A = default U = tom
> JobID = 15500 C = cluster A = default U = tom
> JobID = 15501 C = cluster A = default U = tom
> JobID = 15502 C = cluster A = default U = tom
> JobID = 15503 C = cluster A = default U = tom
> JobID = 15504 C = cluster A = default U = tom
> JobID = 15505 C = cluster A = default U = tom
> JobID = 15506 C = cluster A = default U = tom
> JobID = 15508 C = cluster A = default U = tom
> JobID = 15509 C = cluster A = default U = tom
> [root@login ~]# scontrol show jobid -dd 15500
> slurm_load_jobs error: Invalid job id specified
> [root@login ~]# sacct -j 15500
> JobID JobName Partition Account AllocCPUS State ExitCode
> ------------ ---------- ---------- ---------- ---------- ---------- --------
> 15500 intel-test partition default 48 RUNNING 0:0
>
>
> Is there a "gold standard" way to repair this?
I don't think there is a "gold standard" for this. You probably just
have to go into the database an fix it yourself.
A while ago I posted some code to fix anomalous jobs. It was intended
to make the data plausible (e.g. by adding a missing completion date for
a job with status "RUNNING" which no longer exists), and not for
deleting jobs completely, but it might help:
https://groups.google.com/forum/#!msg/slurm-devel/nf7JxV91F40/KUsS1AmyWRYJ
Cheers,
Loris
--
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin Email [email protected]