|
Hi Coleen,
+1
Thanks,
Serguei
On 11/22/19 14:53, Daniel D. Daugherty wrote:
http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev
src/hotspot/share/prims/jvmtiImpl.cpp
No comments.
src/hotspot/share/prims/jvmtiImpl.hpp
No comments.
src/hotspot/share/runtime/serviceThread.cpp
No comments.
Thumbs up.
Dan
Dan, Thank you for reviewing this!
On 11/22/19 12:49 PM, Daniel D.
Daugherty wrote:
Hi
Coleen,
Sorry for the delay in getting back to this re-review.
Please review a new version of this change that keeps the
nmethod from being unloaded, after it is added to the
deferred event queue:
http://cr.openjdk.java.net/~coleenp/2019/8173361.03/webrev/index.html
src/hotspot/share/code/nmethod.cpp
No comments.
src/hotspot/share/oops/instanceKlass.cpp
No comments.
src/hotspot/share/prims/jvmtiExport.cpp
No comments.
src/hotspot/share/prims/jvmtiImpl.cpp
Nice solution with the new oops_do() and nmethods_do()
functions!
Erik's insistance!
old L988: void JvmtiDeferredEventQueue::enqueue(const
JvmtiDeferredEvent& event) {
new L998: void
JvmtiDeferredEventQueue::enqueue(JvmtiDeferredEvent event) {
Not sure why this was changed.
Update: Looks like Serguei raised the issue and
Coleen has already
resolved it.
Yes.
src/hotspot/share/prims/jvmtiImpl.hpp
old L494: QueueNode(const JvmtiDeferredEvent&
event)
new L498: QueueNode(JvmtiDeferredEvent& event)
Why was this changed?
Update: Not clear if this was covered by Coleen's
reply to Serguei.
old L497: const JvmtiDeferredEvent& event()
const { return _event; }
new L501: JvmtiDeferredEvent& event() { return
_event; }
Why was this changed?
Update: Coleen's reply to Serguei explained this.
Perhaps add:
// Not const because of oops_do() and
nmethods_do().
old L509: static void enqueue(const
JvmtiDeferredEvent& event) NOT_JVMTI_RETURN;
new L513: static void enqueue(JvmtiDeferredEvent
event) NOT_JVMTI_RETURN;
Why was this changed?
Update: Looks like Serguei raised the issue and
Coleen has already
resolved it.
Yes, I fixed these.
src/hotspot/share/runtime/mutexLocker.cpp
This change is going to require some testing to make
sure we don't
have any new deadlock scenarios.
Luckily, I've previously added an implicit NoSafepointVerifier
to locks that are _allow_vm_block = true, like this one.
+ def(JmethodIdCreation_lock , PaddedMutex , leaf, true, _safepoint_check_never); // used for creating jmethodIDs.
which prevents one class of deadlock. If we
take out another lock with a higher rank, we'll get the
ranking assert.
This lock prevents insertion into an array,
and has little outside calls.
I'm running tests in tier 1-6 but any code
that travels through this should get these assertion checks,
rather than deadlocking.
src/hotspot/share/runtime/serviceThread.cpp
L50 - nit - why the extra blank line?
To separate static data member definitions from functions. I
removed it.
src/hotspot/share/runtime/serviceThread.hpp
Thanks for cleaning up the static:
ServiceThread::is_service_thread(Thread* thread)
stuff. Having it be different than the other threads was
a bit jarring.
src/hotspot/share/runtime/thread.hpp
No comments.
Thumbs up. My only comments are nits so I don't need to see
a
new webrev if you decide to fix them.
So it turns out that in stress testing my fix for https://bugs.openjdk.java.net/browse/JDK-8212160
Because I was in the area and thought this was a duplicate of
that bug (it is not). I found that calling oops_do and
nmethods_do the ServiceThread needs to hold the Service_lock,
because other threads can be adding things to the global queue
while the sweeper thread is calling this in a handshake.
I am now retesting this change with the changes above, and with
the Service_lock. So far my stress tests for JDK-81212160 and
the stress test for this bug pass, but I'm going to run through
all the tiers 1-6 over the weekend.
Please have a look at the changes in the meantime.
http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev
http://cr.openjdk.java.net/~coleenp/2019/8173361.04/webrev
Thanks,
Coleen
Dan
Ran the test that failed 100 times without failure, tier1 on
Oracle supported platforms, and tier2-3 including jvmti and
jdi tests locally.
See bug for more details about the crash.
https://bugs.openjdk.java.net/browse/JDK-8173361
Thanks,
Coleen
Hi Serguei,
Sorry for not sending an update. I talked to Erik and am
working on a version that keeps the nmethod from being
unloaded while it's in the deferred event queue, with a
version that the GC people will like, and I like. I'm
testing it out now.
Thanks!
Coleen
Hi Coleen,
Sorry for the latency, I had to investigate it a
little bit.
I still have some doubt your fix is right thing to do.
On 11/16/19 04:55, [email protected]
wrote:
Hi Coleen,
Hi, I've been working on answers to these
questions, so I'll start with this one.
The nmethodLocker keeps the nmethod from being
reclaimed (made_zombie or memory released) by the
sweeper, but the nmethod could be unloaded.
Unloading the nmethod clears the Method* _method
field.
Yes, I see it is done in the
nmethod::make_unloaded().
The post_compiled_method_load event needs the
_method field to look at things like inlining and
ScopeDesc fields. If the nmethod is unloaded,
some of the oops are dead. There are "holder"
oops that correspond to the metadata in the
nmethod. If these oops are dead, causing the
nmethod to get unloaded, then the metadata may not
be valid.
So my change 02 looks for a NULL nmethod._method
field to tell whether we can post information
about the nmethod.
There's code in nmethod.cpp like:
jmethodID nmethod::get_and_cache_jmethod_id() {
if (_jmethod_id == NULL) {
// Cache the jmethod_id since it can no longer
be looked up once the
// method itself has been marked for
unloading.
_jmethod_id = method()->jmethod_id();
}
return _jmethod_id;
}
Which was added when post_method_load and unload
were turned into deferred events.
Could we cache the jmethodID in the
JvmtiDeferredEvent::compiled_method_load_event
similarly as we do in the
JvmtiDeferredEvent::compiled_method_unload_event?
This would help to get rid of the dependency on the
nmethod::_method.
Do we depend on any other nmethod fields?
Yes, there are other nmethod metadata that we rely on
to print inline information, and this function
JvmtiCodeBlobEvents::build_jvmti_addr_location_map
because it uses the ScopeDesc data in the nmethod.
One possible approach is to prepare and cache all this
information
in the nmethod::post_compiled_method_load_event() before
the
JvmtiDeferredEvent::compiled_method_load_event() is
called.
The event parameters are:
typedef struct {
const void* start_address;
jlocation location;
} jvmtiAddrLocationMap;
CompiledMethodLoad(jvmtiEnv *jvmti_env,
jmethodID method,
jint code_size,
const void* code_addr,
jint map_length,
const jvmtiAddrLocationMap* map,
const void* compile_info)
Some of these addresses above could be not accessible
when an event is posted.
Not sure yet if it is Okay.
The question is if this kind of refactoring is worth and
right thing to do.
We do cache the jmethodID but that's not good enough.
See my last comment in the bug report. The jmethodID
can point to an unloaded method.
This looks like it is done a little bit late.
It'd better to do it before the event is deferred (see
above).
I tried a version of keeping the nmethod alive, but
the GC folks will hate it. And it doesn't work and I
hate it.
From serviceability point of view this is the best and
most consistent approach.
I seems to me, it was initially designed this way.
The downside is it adds some extra complexity to the GC.
My version 01 is the best, with the caveat that maybe
it should check for _method == NULL instead of
nmethod->is_alive(). I have to talk to Erik to see
if there's a race with concurrent class unloading.
Any application that depends on a compiled method
loading event on a class that could be unloaded is a
buggy application. Applications should not rely on
when the JIT compiler decides to compile a method!
This happens to us for a stress test. Most
applications will get most of their compiled method
loading events as they normally do.
It is not an application that relies on the compiled
method loading event.
It is about profiling tools to be able to get correct
information about what is going on with compilations.
My concern is that if we skip such compiled method load
events then profilers have no way
to find out there many unneeded compilations that are
thrown away without any real use.
Also, it is not clear what happens with the subsequent
compiled method unload events.
Are they going to be skipped as well or they can appear
and confuse profilers?
Thanks,
Serguei
Thanks,
Coleen
Thanks,
Serguei
I put more debugging in the bug to show this crash
was from an unloaded nmethod.
Coleen
Hi Coleen,
I have some questions.
Both the compiler method load and unload are
posted as deferred events.
Both events keep the nmethod alive until the
ServiceThread processes the event.
The implementation is:
JvmtiDeferredEvent
JvmtiDeferredEvent::compiled_method_load_event(nmethod*
nm) {
. . .
// Keep the nmethod alive until the
ServiceThread can process
// this deferred event.
nmethodLocker::lock_nmethod(nm);
return event;
}
JvmtiDeferredEvent
JvmtiDeferredEvent::compiled_method_unload_event(nmethod*
nm, jmethodID id, const void* code) {
. . .
// Keep the nmethod alive until the
ServiceThread can process
// this deferred event. This will keep the
memory for the
// generated code from being reused too early.
We pass
// zombie_ok == true here so that our nmethod
that was just
// made into a zombie can be locked.
nmethodLocker::lock_nmethod(nm, true /*
zombie_ok */);
return event;
}
void JvmtiDeferredEvent::post() {
assert(ServiceThread::is_service_thread(Thread::current()),
"Service thread must post enqueued
events");
switch(_type) {
case TYPE_COMPILED_METHOD_LOAD: {
nmethod* nm =
_event_data.compiled_method_load;
JvmtiExport::post_compiled_method_load(nm);
// done with the deferred event so unlock
the nmethod
nmethodLocker::unlock_nmethod(nm);
break;
}
case TYPE_COMPILED_METHOD_UNLOAD: {
nmethod* nm =
_event_data.compiled_method_unload.nm;
JvmtiExport::post_compiled_method_unload(
_event_data.compiled_method_unload.method_id,
_event_data.compiled_method_unload.code_begin);
// done with the deferred event so unlock
the nmethod
nmethodLocker::unlock_nmethod(nm);
break;
}
. . .
}
}
Then I wonder how is it possible for the nmethod
to be not alive here?:
2168 void JvmtiExport::post_compiled_method_load(nmethod *nm) {
. . .
2173 // It's not safe to look at metadata for unloaded methods.
2174 if (!nm->is_alive()) {
2175 return;
2176 }
At least, it lokks like something else is
broken.
Do I miss something important here?
Thanks,
Serguei
Summary:
Don't post information which uses metadata
from unloaded nmethods
Tested tier1-3 and 100 times with test that
failed (reproduced failure without the fix).
open webrev at http://cr.openjdk.java.net/~coleenp/2019/8173361.01/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8173361
Thanks,
Coleen
|