On Mon, 1 Jul 2024 09:31:12 GMT, Kevin Walls <kev...@openjdk.org> wrote:
>> The completely unrelated fix to >> [JDK-8335124](https://bugs.openjdk.org/browse/JDK-8335124) led me to believe >> that the issue with sometimes not being able to get the stack trace of the >> SteadyStateThread might be due to the thread being active for a short period >> after being reported as in the Thread.State.BLOCKED state. Once set to that >> state, the thread still needs to call a native OS API to block the thread so >> it is truly idle. During this time the thread stack might be inconsistent >> and not walk-able. The fix is to add a short sleep after the thread has >> moved to the Thread.State.BLOCKED state to give it a chance to finish >> blocking. >> >> Tested with Tier1 CI and all svc test tasks for tier2 and tier5. > > Looks good, let's try it! > > Was wondering if for the failure in ClhsdbDumpheap.java, the missing text was > too far from when LingeredApp was started. But if it's the first subtest, > then it's the stacks in a dumpheap output where we don't find the required > steadyState text. So the test only has to create the array of subtests and > call the first one, before the LingeredApp thread has really blocked... > > Good to make this harmless test change so we get long term testing of it. @kevinjwalls Actually in all cases after launching LingeredApp and waiting for the the SteadyStateThread to be "ready", there is still then the launching of the clhsdb tool, which is going to take some time. Seems hard to believe that the SteadyStateThread would ever lose out on that race. I get the feeling that maybe there is more going on here than I initially thought. Almost all of these failures are on Windows (about 22 out of 25) with the other 3 on linux-arm. Maybe sometimes there is some sort of OS hiccup that is delaying the SteadyStateThread. In any case, no real harm with this fix, and hopefully it helps ------------- PR Comment: https://git.openjdk.org/jdk/pull/19951#issuecomment-2200769458