On Tue, 25 May 2021 21:46:27 GMT, Hao Tang <github.com+7947546+tanghaot...@openjdk.org> wrote:
>> OperatingSystemImpl.getCpuLoad() may return 1.0 in a container, even though >> the CPU load is obviously below 100%. >> >> We created a 5-core container and run 4 "while (true)" loops in the >> container. OperatingSystemImpl.getCpuLoad() returned 1.0, which is incorrect >> (0.8 is correct). >> "systemLoad" in getCpuLoad() is exactly 4.0 before "systemLoad = >> Math.min(1.0, systemLoad);". The problem is caused by using the elapsed time >> (specified by "cpu.cfs_period_us") instead of the total CPU time (specified >> by "cpu.cfs_quota_us"). Therefore, it is more reasonable to divide cpu usage >> time by "quotaNanos" instead of "elapsedNanos". > > Hao Tang has updated the pull request incrementally with two additional > commits since the last revision: > > - Use historical-value-based formula for both cpu-quota-based and > cpu-shares-based calculation > - rename usageTicks and totalTicks I haven't followed yet all of discussions in this review, but I concern this PR changes the meaning of `getCpuLoad()`. `getCpuLoad()` has been based on total time since the start of the container, but after this PR, it is based on the ticks in earlier call. Is it ok? IMHO it can be accepted because it is the same with load average on Linux, but I concern we may need CSR because this PR changes behavior. ------------- PR: https://git.openjdk.java.net/jdk/pull/3656