On Wed, 19 Jun 2024 15:06:25 GMT, Axel Boldt-Christmas <abold...@openjdk.org> 
wrote:

>> ClassLoaderDataGraph provides APIs for walking different metadata. All the 
>> iterators which are not designed to be used by the GC also keep the holder 
>> of the CLDs alive and by extensions keeps all metadata alive. This is 
>> problematic for concurrent GC as it keeps otherwise unreachable classes from 
>> being unloaded and the respective metadata freed. 
>> 
>> This patch changes the default iteration behaviour to not keep the holder 
>> alive, with the exception of `loaded_classes_do` (renamed 
>> `loaded_classes_do_keepalive`) and `modules_do` (renamed 
>> `modules_do_keepalive`) which is used by jvmti APIs that requires that the 
>> holder is kept alive.
>> 
>> All other uses consumes all the metadata it queries during its safepoint or 
>> before releasing the `ClassLoaderDataGraph_lock`. 
>> 
>> Before this change some jcmd, new jfr chunks and some jfr events, all of 
>> which consumed these APIs, could cause class unloading to not occur. 
>> 
>> Been running our internal stress test in an even more stressful mode which 
>> without this patch reproduces the metaspace OOME 
>> [JDK-8326005](https://bugs.openjdk.org/browse/JDK-8326005) consistently 
>> within a few hours. And after this patch it does not.
>> 
>> Currently running tier1-tier8 testing.
>
> Axel Boldt-Christmas has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Rename and comment SystemDictionary::methods_do

So the problem with the no-keepalive names is that for these functions it's 
very unclear what exactly you're keeping alive, unless you're thinking about 
concurrent collection GC, and even then it's a puzzler.  I don't see how it's 
going to help prevent more bugs.  I don't yet understand the current bug.  The 
'resolve' for the CLDG iterator was to temporarily keep that CLD from being 
unloaded, in the short time that we're iterating on that particular CLD.

The CLDG_lock was added later and it may be that that's what prevents the CLD 
from being *deleted* while we are holding the lock while iterating.  So that 
keeps the CLD metadata alive.  The oops in the CLD are not kept alive unless we 
have resolved the holder which the GC sees as the root.  So if we are accessing 
oops from the CLD, we need to use the keepalive version.  It's not really 
keeping the CLD alive.  It's making the oops alive, or accessible until they 
are handled somewhere by the caller.

So the no-keepalive is not a great name for this, even though it's used when 
accessing oops in other places.  I think the versions of these functions that 
access oops that need to be kept alive should have a different name, like 
modules_do_keeping_oops_alive().  But that's a sentence name.  Maybe 
modules_do_keepalive() can be read as keeping oops alive.  But 
classes_do_no_keepalive() is a terrible name because we want the class metadata 
to be kept alive and the name is disconcerting.

Usually I prefer good names to good comments, but here I think good comments 
are better.

-------------

PR Review: https://git.openjdk.org/jdk/pull/19769#pullrequestreview-2133534047

Reply via email to