On Thu, Jun 1, 2023 at 9:17 AM Jaroslav Bachorík <
jaroslav.bacho...@datadoghq.com> wrote:

> Hi David,
>
> On Thu, Jun 1, 2023 at 3:56 AM David Holmes <david.hol...@oracle.com>
> wrote:
>
>> Hi Jaroslav,
>>
>> On 31/05/2023 9:12 pm, Jaroslav Bachorík wrote:
>> > Dear Team,
>> >
>> > I've been investigating the unusual JVM crashes occurring in JVMTI
>> calls
>> > on a J9 JVM. During my investigation, I scrutinized the `jmethodID`
>> > definition closely, available here: [jmethodID
>> > definition](
>> https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID
>> <
>> https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID
>> >).
>> >
>> > To paraphrase, the definition suggests that `jmethodID` identifies a
>> > Java method, initializer, or constructor. These identifiers, once
>> > returned by JVM TI functions and events, can be safely stored. However,
>> > when the class is unloaded, they become invalid, rendering them
>> > unsuitable for use.
>> >
>> > My interpretation is that the JVMTI user should verify the validity of
>> a
>> > `jmethodID` value before using it to prevent potential crashes. Would
>> > you agree with this interpretation?
>>
>> Not quite - as you note you can't verify the jmethodID validity. What
>> the user needs to do, in line with what Dan was saying, is ensure that
>> they keep track of the classes to which the methods belong and keep them
>> alive if necessary. Now that may be easier said than done, but that is
>> the gist of it. This comes from the JNI spec:
>>
>> "A field or method ID does not prevent the VM from unloading the class
>> from which the ID has been derived. After the class is unloaded, the
>> method or field ID becomes invalid and may not be passed to any function
>> taking such an ID. The native code, therefore, must make sure to:
>>
>>      keep a live reference to the underlying class, or
>>      recompute the method or field ID
>>
>> if it intends to use a method or field ID for an extended period of time."
>>
>> > This sounds like a sensible requirement, but its practical application
>> > remains unclear. As far as I know, methods can be unloaded concurrently
>> > to the native code executing JVMTI functions. This introduces a
>> > potential race condition where the JVM unloads the methods during the
>> > check->use flow, making it only a partial solution. To complicate
>> > matters further, no method exists to confirm whether a `jmethodID` is
>> valid.
>> >
>> > Theoretically, we could monitor the `CompiledMethodUnload` event to
>> > track the validity state, creating a constantly expanding set of
>> > unloaded `jmethodID` values or a bloom filter, if one does not care
>> > about few potential false positives. This strategy, however, doesn't
>> > address the potential race condition, and it could even exacerbate it
>> > due to possible event delays. This delay might mistakenly validate a
>> > `jmethodID` value that has already been unloaded, but for which the
>> > event hasn't been delivered yet.
>> >
>> > Honestly, I don't see a way to use `jmethodID` safely unless the code
>> > using it suspends the entire JVM and doesn't resume until it's finished
>> > with that `jmethodID`. Any other approach might lead to JVM crashes, as
>> > we've observed with J9.
>> >
>> > Lastly, it's noteworthy that Hotspot takes meticulous measures to
>> ensure
>> > that using jmethodIDs for unloaded methods doesn't crash the JVM and
>> > even provides useful information. This observation has led me to
>> > question whether the documentation aligns with the Hotspot
>> > implementation, especially given that following closely the
>> > documentation appears to increase the risk associated with the use of
>> > `jmethodID` values.
>>
>> There have been folk who wanted to make this area more user-friendly but
>> that shouldn't be mistaken for moving towards a world where jmethodIDs
>> are always safe to use.
>>
>
> Yes, I see your point. Unfortunately, this confirms my worries that using
> AsyncGetCallTrace (ASGCT) on a system strictly adhering to the JVMTI spec
> of jmethoID is not really possible without risking random and quite
> frequent crashes on systems with concurrent class unloading enabled.
> FTR, ASGCT will record the stack trace as a list of frames, each one
> containing the corresponding jmethodID value. Considering that the most
> common usage of ASGCT is in a signal handler it makes it impossible to use
> JVMTI calls to resolve the holder class and create a strong reference to
> prevent it from being unloaded.
> And even if this would be possible we would need to figure out when to
> release the class reference when it is no more needed - and it is not
> really clear how we could do that reliably, leaving us with the option of
> holding the class references indefinitely or risking crashing JVM.
>
> I want to emphasize that not being able to resolve additional details for
> a jmethodID pointing to a method of an unloaded class is not an issue, as
> long as the JVMTI call does not crash. I think that
> https://bugs.openjdk.org/browse/JDK-8268364 did address exactly the
> problem of concurrent class unloading causing races in the code that is
> checking for validity of jmethodID and then using it.
>
>
> Can this be summarize in a way that the user is not guaranteed to get any
> additional information for an invalid jmethodID but it would be really nice
> for JVM not to crash when jmethodID becomes invalid as there is no way for
> the user to check for its validity in an atomic manner
> - and yes, even calling GetMethodDeclaringClass in order to obtain the
> class one could create a strong reference is a subject to racy behaviour so
> it really can not be used as a workaround.
>
>
I wonder about this. Looking at https://bugs.openjdk.org/browse/JDK-8268364
, we zero out the JNI handle table for the CLD earlier during class
unloading. IIUC ClassLoaderData::unload() can run concurrently to
Method::checked_resolve_jmethod_id,
so could we still not end up in the same situation (resolving Method when
the JNI table slot had still been occupied, but dereferencing Method after
it was freed? Admittedly more rarely?

Freeing method would return it to metaspace, and it may or may not be
uncommitted so crashes would be intermittent and random.

This is a Coleen question :)

Cheers,
>
> -JB-
>
>
>>
>> Cheers,
>> David
>>
>> > I welcome your thoughts and perspectives on this matter.
>> >
>> > Best regards,
>> >
>> > Jaroslav
>>
>

Reply via email to