* Juergen Salk <juergen.s...@uni-ulm.de> [210515 23:54]: > * Christopher Samuel <ch...@csamuel.org> [210514 15:47]: > > > > Usage reported in Percentage of Total > > > -------------------------------------------------------------------------------- > > > > > > Cluster TRES Name Allocated Down PLND Dow Idle > > > Reserved Reported > > > --------- -------------- ------------ -------- -------- ----------- > > > -------- ------------ > > > oph cpu 81.93% 0.00% 0.00% 15.85% > > > 2.22% 100.00% > > > oph mem 80.60% 0.00% 0.00% 19.40% > > > 0.00% 100.00% > > > > The "Reserved" column is the one you're interested in, it's indicating that > > for the 13th some jobs were waiting for CPUs, not memory. > > > However, there is also "Overcommited" in the sreport man page which > looks promising by description - although its exact definition > is also not completely clear to me right away: > > --- snip --- > > Overcommited > > Time of eligible jobs waiting in the queue over the Reserved time. > Unlike Reserved, this has no limit. It is typically useful to > determine whether your system is overloaded and by how much. > > --- snip ---
And I just noticed that this description of "Overcommited" in sreport(1) man page first came in with versions 20.02.7 and 20.11.1, respectively. In versions prior to 20.02.7 and 20.11.1 this still was: --- snip --- Overcommited Time that the nodes were over allocated, either with the -O, --overcommit flag at submission time or OverSubscribe set to FORCE in the slurm.conf. This time is not counted against the total reported time. --- snip --- So, I assume, the description of "Overcommited" in sreport(1) man page was simply wrong in older versions (unless its semantics has changed with version 20.02.7 and 20.11.1 ) ... Best regards Jürgen