Good afternoon.
I am having a difficult time understanding the output of 'qacct' in our
cluster. The output of this command
qacct -b 201601010000 -e 201612312359
is as follows:
WALLCLOCK: 5177246915
UTIME: 18823970903.294
STIME: 1458328785.078
CPU: 97217627416.480
I am assuming WALLCLOCK is the wall time of all simulations in seconds, and
CPU is the CPU time of all simulations in seconds. Converting them to
hours, I get 1,438,124 hours for wall time, and 27,004,896 hours for CPU
time.
Our HPC cluster has 1536 cores and the core count stayed the same in 2016.
1536 cores running 24 hours each day and 365 days a year would yield
13,455,360 core-hours.
How would I go about explaining the observed discrepancy: maximum possible
core hours (13,455,360) vs accounted core-hours (27,004,896)?
This RQS rule has always been in effect:
{
name limit_oversubscription
description Prevent core over-subscription
enabled TRUE
limit hosts {*} to slots=$num_proc
}
Thank you for your time and help.
Best regards,
Gowtham
--
Gowtham, PhD
Director of Research Computing, IT
Adj. Asst. Professor, ECE and Physics
Michigan Technological University
P: (906) 487-4096
F: (906) 487-2787
https://it.mtu.edu
https://hpc.mtu.edu
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users