All, I'm collecting some usage metrics for our cluster, and I'd like to look at utilisation in terms of allocated CPU % by partition, basically equivalent of `sinfo -O cpusstate -p partition_name`, but for historic data. What's the best way to do this?
I've found that running `sacct --allusers --state=RUNNING` and summing allocated CPUs seems to give the same results as `sinfo`, but when feeding in an an explicit starttime/endtime parameter it's not so clear. My naive approach of simply adding up allocated cored over say a 5min window seems to give a lower value than sinfo. Does this command syntax capture jobs that were running the entire window, or just at some point? Can I query an instantaneous time, rather than window? Am I missing something else? I've played with the `sreport` command as well, but that doesn't seem to allow specifying a specific partition to analyse. Once I've got the general pattern down, I'l like to analyse by other job characteristics too (e.g. single/multicore). Appreciate any guidance! Cheers, David