Re: [15?] RFR (S): 8249192: MonitorInfo stores raw oops across safepoints

David Holmes Tue, 21 Jul 2020 17:47:24 -0700

Hi Thomas,

I've looked at the incremental update and I am happy with that.

I also, prompted by you mentioning it, took a deeper look at thebiased-locking code to ensure it also keeps the MonitorInfo'sthread-confined, and to see whether the handshake versions couldthemselves be susceptible to interference from safepoints (which theycan't as far as I can determine). And that all seems fine.

As per offline discussions I know that there has been an alternateproposal for a completely localized fix in the stackwalker code thatsimply retrieves the list of monitors, uses the length to create thearray, then re-retrieves the list of monitors to populate the array (thelength of which can't change as we are dealing with the current thread).My only concern with that approach is the performance impact if we havedeep stacks with lots of monitors. There is a microbenchmark forStackWalker in the repo:


open/test/micro/org/openjdk/bench/java/lang/StackWalkBench.java

but it doesn't test anything to do with monitor usage.

Thanks,
David
-----

On 22/07/2020 4:00 am, Thomas Schatzl wrote:

Hi Coleen and David,

   thanks for your reviews.

On 21.07.20 03:29, David Holmes wrote:
Hi Thomas,

On 21/07/2020 12:49 am, Thomas Schatzl wrote:
Forwarding to hotspot-dev where it belongs after wrongly sending tohotspot-gc-dev.
This touches serviceability code as well so cc'ing for good measure.

Thanks for taking this one on as it wasn't actually a GC issue!
I have been looking into strange G1 crashes with changed young gensizing in tier8, which looked very similar to the crashes in thatStackWalker/LocalsAndOperands.java test (thanks to Dean Long making usaware of these crashes).I guessed correctly in hindsight, that the main difference between C2and Graal would be GC timing, so the problem could be related. Also wehave some P2s for 15 with the same stack trace as in this reproducer andthe mentioned tier8 (Kitchensink, 24h dacapo) failures that neverreproduced... given that the issue reproduced much quicker inLocalsAndOperands (and given its source code gives a pretty narrow areawhere to look) it seemed easier to start with this one...
-------- Forwarded Message --------
Subject: [15?] RFR (S): 8249192: MonitorInfo stores raw oops acrosssafepoints
Date: Mon, 20 Jul 2020 12:07:38 +0200
From: Thomas Schatzl <[email protected]>
To: [email protected] <[email protected]>

Hi all,
can I get some reviews to handle'ize some raw oops in theMonitorInfo class?
(Afaiu only) in LiveFrameStream::monitors_to_object_array() we try toallocate an objArray with raw oops held in the MonitorInfo class thatare passed in a GrowableArray. This allocation can lead to a garbagecollection, with the usual random crashes.
Right - seems so obvious now. <sigh>
Took me a while to convince myself no such similar problem was lurkingin the JVM TI code.
This change changes the raw oops in MonitorInfo to Handles,
My main concern here was whether the MonitorInfo objects are threadconfined. For the StackWalker API we are always dealing with thecurrent thread so that is fine. For JVM TI, in mainline, we may beexecuting code in the calling thread or the target thread; and inolder releases it will be the VMThread at a safepoint. But it seemsthat the MonitorInfo's are confined to whichever thread that is, andso Handle usage is safe.
and adds a few HandleMarks along the way to make these handles goaway asap.
That, and the ResourceMark changes, were a bit hard to follow.Basically a HandleMark is now present in the scope of, or just above,the call to monitors(). The need for the additional ResourceMarks isfar from clear though. In particular I wonder if the RM introduced inDeoptimization::revoke_from_deopt_handler interacts with the specialDeoptResourceMark in its callerDeoptimization::fetch_unroll_info_helper? (I have no idea what aDeoptResourceMark is.)
The DeoptResourceMark in this case seems to act just like a regularResourceMark, using the thread's resource area.
Other than acting like a ResourceMark, the DeoptResourceMark only seemsto be an indicator used in some asserts to verify that no furtherdeoptimization is running.
Looking at the called BiasedLocking::revoke_own_lock(), it adds its own(regular) ResourceMark quite early (further indicating that normalResourceMarks and DeoptResourceMarks should be "compatible"), meaningthat only any resource object allocated in the resource area between theadded ResourceMark in Deoptimization::revoke_from_deopt_handler() andthe existing one in BiasedLocking::revoke_own_lock() would have adifferent lifetime than before. There is no resource object allocationin there, actually the only thing that happens is unpacking the contentsof the passed handle.
The objects_to_revoke array in Deoptimization::revoke_from_deopt_handlerdoes not escape the method too.
I was much more worried about the caching of aGrowableArray<MonitorInfo> in Thread::cached_monitor_info going onduring biased locking...
This issue has been introduced in JDK-8140450: ImplementStack-Walking API in jdk9.
The CR has been triaged as P3, but I would like to ask whether itmight be good to increase its priority to P2 and apply for inclusionin 15. My arguments are as follows:
- the original issue why I started looking at this were lots ofseemingly random crashes (5 or 6 were reported and the changetemporarily backed out for this reason) in tier8 with a g1 changethat changed young gen sizing. These crashes including that young gensizing change are all gone now with this bugfix.I.e. this suggests that so far we seem to have not encountered thisissue more frequently due to pure luck wrt to generation sizing.
- it affects all collectors (naturally).
- there are quite a few user reported random crashes with IntelliJand variants, which due to the nature of IDEs tending to retrievestack traces fairly frequently would be more affected than usual. SoI suspect at least some of them to be caused by this issue, these arethe only raw oops I am aware of.
My understanding of the cause and fix is fairly good, but I am noexpert in this area, so I would like to defer to you about thissuggestion. The change is imo important enough to be backported to 11and 15 anyway, but the question is about the risk/reward tradeoff wrtto bringing it to 15 and not 15.0.1.
I'd classify this as a P2 without doubt. As Dan noted there is noworkaround as such.
CR:
https://bugs.openjdk.java.net/browse/JDK-8249192
Webrev:
http://cr.openjdk.java.net/~tschatzl/8249192/webrev/
src/hotspot/share/runtime/deoptimization.cpp
The code in collect_monitors takes the monitor owner oop andHandelises it to add to its own GrowableArray of Handles. Is it worthexposing the MonitorInfo owner() in Handle form to avoid thisunwrapping and re-wrapping?
We talked a bit about this internally and came to the conclusion toinitially provide a small fix for 15 and 16, and do that suggestedrefactoring in 16 only.
src/hotspot/share/runtime/vframe.hpp
I agree with Coleen that the MonitorInfo constructor should not take aThread* but should itself materialize and use Thread::current().
Fixed.

New webrevs (jdk16):

http://cr.openjdk.java.net/~tschatzl/8249192/webrev.0_to_1/ (diff)
http://cr.openjdk.java.net/~tschatzl/8249192/webrev.1/ (full)

jdk15:

http://cr.openjdk.java.net/~tschatzl/8249192/webrev.jdk15.1/

The only difference is that JDK-8247729 in 16 changed a

ResourceMark rm;

in jdk15 (jvmtiEnvBase.cpp:1029) from

ResourceMark rm(current_thread);

in jdk16 (jvmtiEnvBase.cpp:1008)
which gives a merge error now. See alsohttps://hg.openjdk.java.net/jdk/jdk/rev/f8a9be0f9e1a#l2.82 .
Started another tier1-5 run for jdk15; both versions passed 1.2kiterations of the LocalsAndOperands.java test again.
Thanks,
   Thomas

Re: [15?] RFR (S): 8249192: MonitorInfo stores raw oops across safepoints

Reply via email to