Hi Thomas,
On 21/07/2020 12:49 am, Thomas Schatzl wrote:
Forwarding to hotspot-dev where it belongs after wrongly sending to
hotspot-gc-dev.
This touches serviceability code as well so cc'ing for good measure.
Thanks for taking this one on as it wasn't actually a GC issue!
-------- Forwarded Message --------
Subject: [15?] RFR (S): 8249192: MonitorInfo stores raw oops across
safepoints
Date: Mon, 20 Jul 2020 12:07:38 +0200
From: Thomas Schatzl <thomas.scha...@oracle.com>
To: hotspot-...@openjdk.java.net <hotspot-gc-...@openjdk.java.net>
Hi all,
can I get some reviews to handle'ize some raw oops in the MonitorInfo
class?
(Afaiu only) in LiveFrameStream::monitors_to_object_array() we try to
allocate an objArray with raw oops held in the MonitorInfo class that
are passed in a GrowableArray. This allocation can lead to a garbage
collection, with the usual random crashes.
Right - seems so obvious now. <sigh>
Took me a while to convince myself no such similar problem was lurking
in the JVM TI code.
This change changes the raw oops in MonitorInfo to Handles,
My main concern here was whether the MonitorInfo objects are thread
confined. For the StackWalker API we are always dealing with the current
thread so that is fine. For JVM TI, in mainline, we may be executing
code in the calling thread or the target thread; and in older releases
it will be the VMThread at a safepoint. But it seems that the
MonitorInfo's are confined to whichever thread that is, and so Handle
usage is safe.
and adds a few HandleMarks along the way to make these handles go away asap.
That, and the ResourceMark changes, were a bit hard to follow. Basically
a HandleMark is now present in the scope of, or just above, the call to
monitors(). The need for the additional ResourceMarks is far from clear
though. In particular I wonder if the RM introduced in
Deoptimization::revoke_from_deopt_handler interacts with the special
DeoptResourceMark in its caller
Deoptimization::fetch_unroll_info_helper? (I have no idea what a
DeoptResourceMark is.)
This issue has been introduced in JDK-8140450: Implement Stack-Walking
API in jdk9.
The CR has been triaged as P3, but I would like to ask whether it might
be good to increase its priority to P2 and apply for inclusion in 15. My
arguments are as follows:
- the original issue why I started looking at this were lots of
seemingly random crashes (5 or 6 were reported and the change
temporarily backed out for this reason) in tier8 with a g1 change that
changed young gen sizing. These crashes including that young gen sizing
change are all gone now with this bugfix.
I.e. this suggests that so far we seem to have not encountered this
issue more frequently due to pure luck wrt to generation sizing.
- it affects all collectors (naturally).
- there are quite a few user reported random crashes with IntelliJ and
variants, which due to the nature of IDEs tending to retrieve stack
traces fairly frequently would be more affected than usual. So I suspect
at least some of them to be caused by this issue, these are the only raw
oops I am aware of.
My understanding of the cause and fix is fairly good, but I am no expert
in this area, so I would like to defer to you about this suggestion. The
change is imo important enough to be backported to 11 and 15 anyway, but
the question is about the risk/reward tradeoff wrt to bringing it to 15
and not 15.0.1.
I'd classify this as a P2 without doubt. As Dan noted there is no
workaround as such.
CR:
https://bugs.openjdk.java.net/browse/JDK-8249192
Webrev:
http://cr.openjdk.java.net/~tschatzl/8249192/webrev/
src/hotspot/share/runtime/deoptimization.cpp
The code in collect_monitors takes the monitor owner oop and Handelises
it to add to its own GrowableArray of Handles. Is it worth exposing the
MonitorInfo owner() in Handle form to avoid this unwrapping and re-wrapping?
src/hotspot/share/runtime/vframe.hpp
I agree with Coleen that the MonitorInfo constructor should not take a
Thread* but should itself materialize and use Thread::current().
Thanks,
David
-----
Testing:
tier1-5,tier8 (with some unrelated changes), 1800+ runs of the reproducer
Thanks,
Thomas