Re: RFR (S) 8173361: various crashes in JvmtiExport::post_compiled_method_load

coleen . phillimore Fri, 22 Nov 2019 15:17:00 -0800


Thanks for the re-review, Dan!
Coleen


On 11/22/19 5:53 PM, Daniel D. Daugherty wrote:

http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev
src/hotspot/share/prims/jvmtiImpl.cpp
    No comments.

src/hotspot/share/prims/jvmtiImpl.hpp
    No comments.

src/hotspot/share/runtime/serviceThread.cpp
    No comments.

Thumbs up.

Dan


On 11/22/19 2:15 PM, [email protected] wrote:
Dan, Thank you for reviewing this!

On 11/22/19 12:49 PM, Daniel D. Daugherty wrote:
Hi Coleen,

Sorry for the delay in getting back to this re-review.


On 11/21/19 9:12 AM, [email protected] wrote:
Please review a new version of this change that keeps the nmethodfrom being unloaded, after it is added to the deferred event queue:
http://cr.openjdk.java.net/~coleenp/2019/8173361.03/webrev/index.html
src/hotspot/share/code/nmethod.cpp
    No comments.

src/hotspot/share/oops/instanceKlass.cpp
    No comments.

src/hotspot/share/prims/jvmtiExport.cpp
    No comments.

src/hotspot/share/prims/jvmtiImpl.cpp
    Nice solution with the new oops_do() and nmethods_do() functions!
Erik's insistance!
old L988: void JvmtiDeferredEventQueue::enqueue(constJvmtiDeferredEvent& event) { new L998: voidJvmtiDeferredEventQueue::enqueue(JvmtiDeferredEvent event) {
        Not sure why this was changed.
Update: Looks like Serguei raised the issue and Coleen hasalready
        resolved it.
Yes.
src/hotspot/share/prims/jvmtiImpl.hpp
    old L494:     QueueNode(const JvmtiDeferredEvent& event)
    new L498:     QueueNode(JvmtiDeferredEvent& event)
        Why was this changed?
Update: Not clear if this was covered by Coleen's reply toSerguei.
old L497: const JvmtiDeferredEvent& event() const { return_event; }
    new L501:     JvmtiDeferredEvent& event() { return _event; }
        Why was this changed?

        Update: Coleen's reply to Serguei explained this. Perhaps add:
                  // Not const because of oops_do() and nmethods_do().
old L509: static void enqueue(const JvmtiDeferredEvent& event)NOT_JVMTI_RETURN; new L513: static void enqueue(JvmtiDeferredEvent event)NOT_JVMTI_RETURN;
        Why was this changed?
Update: Looks like Serguei raised the issue and Coleen hasalready
        resolved it.
Yes, I fixed these.
src/hotspot/share/runtime/mutexLocker.cpp
    This change is going to require some testing to make sure we don't
    have any new deadlock scenarios.
Luckily, I've previously added an implicit NoSafepointVerifier tolocks that are _allow_vm_block = true, like this one.+ def(JmethodIdCreation_lock , PaddedMutex , leaf, true,_safepoint_check_never); // used for creating jmethodIDs.which prevents one class of deadlock. If we take out another lockwith a higher rank, we'll get the ranking assert.
This lock prevents insertion into an array, and has little outside calls.
I'm running tests in tier 1-6 but any code that travels through thisshould get these assertion checks, rather than deadlocking.
src/hotspot/share/runtime/serviceThread.cpp
    L50 - nit - why the extra blank line?
To separate static data member definitions from functions.  I removed it.
src/hotspot/share/runtime/serviceThread.hpp
    Thanks for cleaning up the static:

      ServiceThread::is_service_thread(Thread* thread)

    stuff. Having it be different than the other threads was
    a bit jarring.

src/hotspot/share/runtime/thread.hpp
    No comments.

Thumbs up. My only comments are nits so I don't need to see a
new webrev if you decide to fix them.
So it turns out that in stress testing my fixforhttps://bugs.openjdk.java.net/browse/JDK-8212160
Because I was in the area and thought this was a duplicate of thatbug (it is not). I found that calling oops_do and nmethods_do theServiceThread needs to hold the Service_lock, because other threadscan be adding things to the global queue while the sweeper thread iscalling this in a handshake.
I am now retesting this change with the changes above, and with theService_lock. So far my stress tests for JDK-81212160 and thestress test for this bug pass, but I'm going to run through all thetiers 1-6 over the weekend.
Please have a look at the changes in the meantime.

http://cr.openjdk.java.net/~coleenp/2019/8173361.04.incr/webrev
http://cr.openjdk.java.net/~coleenp/2019/8173361.04/webrev

Thanks,
Coleen
Dan
Ran the test that failed 100 times without failure, tier1 on Oraclesupported platforms, and tier2-3 including jvmti and jdi tests locally.
See bug for more details about the crash.

https://bugs.openjdk.java.net/browse/JDK-8173361

Thanks,
Coleen

On 11/18/19 10:09 PM, [email protected] wrote:
Hi Serguei,
Sorry for not sending an update. I talked to Erik and am workingon a version that keeps the nmethod from being unloaded while it'sin the deferred event queue, with a version that the GC peoplewill like, and I like. I'm testing it out now.
Thanks!
Coleen


On 11/18/19 10:03 PM, [email protected] wrote:
Hi Coleen,

Sorry for the latency, I had to investigate it a little bit.
I still have some doubt your fix is right thing to do.


On 11/16/19 04:55, [email protected] wrote:
On 11/15/19 11:17 PM, [email protected] wrote:
Hi Coleen,

On 11/15/19 2:12 PM, [email protected] wrote:
Hi, I've been working on answers to these questions, so I'llstart with this one.
The nmethodLocker keeps the nmethod from being reclaimed(made_zombie or memory released) by the sweeper, but thenmethod could be unloaded. Unloading the nmethod clears theMethod* _method field.
Yes, I see it is done in the nmethod::make_unloaded().
The post_compiled_method_load event needs the _method field tolook at things like inlining and ScopeDesc fields. If thenmethod is unloaded, some of the oops are dead. There are"holder" oops that correspond to the metadata in the nmethod. If these oops are dead, causing the nmethod to get unloaded,then the metadata may not be valid.
So my change 02 looks for a NULL nmethod._method field to tellwhether we can post information about the nmethod.
There's code in nmethod.cpp like:

jmethodID nmethod::get_and_cache_jmethod_id() {
  if (_jmethod_id == NULL) {
// Cache the jmethod_id since it can no longer be lookedup once the
    // method itself has been marked for unloading.
    _jmethod_id = method()->jmethod_id();
  }
  return _jmethod_id;
}
Which was added when post_method_load and unload were turnedinto deferred events.
Could we cache the jmethodID in theJvmtiDeferredEvent::compiled_method_load_eventsimilarly as we do in theJvmtiDeferredEvent::compiled_method_unload_event?This would help to get rid of the dependency on thenmethod::_method.
Do we depend on any other nmethod fields?
Yes, there are other nmethod metadata that we rely on to printinline information, and this functionJvmtiCodeBlobEvents::build_jvmti_addr_location_map because ituses the ScopeDesc data in the nmethod.
One possible approach is to prepare and cache all this information
in the nmethod::post_compiled_method_load_event() before the
JvmtiDeferredEvent::compiled_method_load_event() is called.
The event parameters are:
typedef struct {
     const void* start_address;
     jlocation location;
} jvmtiAddrLocationMap;
CompiledMethodLoad(jvmtiEnv *jvmti_env,
             jmethodID method,
             jint code_size,
             const void* code_addr,
             jint map_length,
             const jvmtiAddrLocationMap* map,
             const void* compile_info)
Some of these addresses above could be not accessible when anevent is posted.
Not sure yet if it is Okay.
The question is if this kind of refactoring is worth and rightthing to do.
We do cache the jmethodID but that's not good enough. See mylast comment in the bug report. The jmethodID can point to anunloaded method.
This looks like it is done a little bit late.
It'd better to do it before the event is deferred (see above).
I tried a version of keeping the nmethod alive, but the GC folkswill hate it. And it doesn't work and I hate it.
From serviceability point of view this is the best and mostconsistent approach.
I seems to me, it was initially designed this way.
The downside is it adds some extra complexity to the GC.
My version 01 is the best, with the caveat that maybe it shouldcheck for _method == NULL instead of nmethod->is_alive(). Ihave to talk to Erik to see if there's a race with concurrentclass unloading.
Any application that depends on a compiled method loading eventon a class that could be unloaded is a buggy application. Applications should not rely on when the JIT compiler decides tocompile a method! This happens to us for a stress test. Mostapplications will get most of their compiled method loadingevents as they normally do.
It is not an application that relies on the compiled methodloading event.It is about profiling tools to be able to get correct informationabout what is going on with compilations.My concern is that if we skip such compiled method load eventsthen profilers have no wayto find out there many unneeded compilations that are thrown awaywithout any real use.Also, it is not clear what happens with the subsequent compiledmethod unload events.Are they going to be skipped as well or they can appear andconfuse profilers?
Thanks,
Serguei
Thanks,
Coleen
Thanks,
Serguei
I put more debugging in the bug to show this crash was from anunloaded nmethod.
Coleen


On 11/15/19 4:45 PM, [email protected] wrote:
Hi Coleen,

I have some questions.
Both the compiler method load and unload are posted asdeferred events.Both events keep the nmethod alive until the ServiceThreadprocesses the event.
The implementation is:
JvmtiDeferredEventJvmtiDeferredEvent::compiled_method_load_event(nmethod* nm) {
  . . .
  // Keep the nmethod alive until the ServiceThread can process
  // this deferred event.
  nmethodLocker::lock_nmethod(nm);
  return event;
}
JvmtiDeferredEventJvmtiDeferredEvent::compiled_method_unload_event(nmethod* nm,jmethodID id, const void* code) {
  . . .
  // Keep the nmethod alive until the ServiceThread can process
  // this deferred event. This will keep the memory for the
  // generated code from being reused too early. We pass
  // zombie_ok == true here so that our nmethod that was just
  // made into a zombie can be locked.
  nmethodLocker::lock_nmethod(nm, true /* zombie_ok */);
  return event;
}

void JvmtiDeferredEvent::post() {
assert(ServiceThread::is_service_thread(Thread::current()),
         "Service thread must post enqueued events");
  switch(_type) {
    case TYPE_COMPILED_METHOD_LOAD: {
      nmethod* nm = _event_data.compiled_method_load;
JvmtiExport::post_compiled_method_load(nm);
      // done with the deferred event so unlock the nmethod
      nmethodLocker::unlock_nmethod(nm);
      break;
    }
    case TYPE_COMPILED_METHOD_UNLOAD: {
      nmethod* nm = _event_data.compiled_method_unload.nm;
      JvmtiExport::post_compiled_method_unload(
_event_data.compiled_method_unload.method_id,
_event_data.compiled_method_unload.code_begin);
      // done with the deferred event so unlock the nmethod
      nmethodLocker::unlock_nmethod(nm);
      break;
    }
    . . .
  }
}
Then I wonder how is it possible for the nmethod to be notalive here?:
2168 void JvmtiExport::post_compiled_method_load(nmethod *nm) {
. . .
2173 // It's not safe to look at metadata for unloaded methods.
2174 if (!nm->is_alive()) {
2175 return;
2176 }
At least, it lokks like something else is broken.
Do I miss something important here?

Thanks,
Serguei


On 11/14/19 5:15 PM, [email protected] wrote:
Summary: Don't post information which uses metadata fromunloaded nmethods
Tested tier1-3 and 100 times with test that failed(reproduced failure without the fix).
open webrev athttp://cr.openjdk.java.net/~coleenp/2019/8173361.01/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8173361

Thanks,
Coleen

Re: RFR (S) 8173361: various crashes in JvmtiExport::post_compiled_method_load

Reply via email to