On Thu, 28 Aug 2025 01:30:39 GMT, pf0n <[email protected]> wrote:
> ### Summary
>
> The new implementation of ObjectCountAfterGC for Shenandoah piggybacks off of
> the existing marking phases and records strongly marked objects in a
> histogram. If the event is disabled, the original marking closures are used.
> When enabled new mark-and-count closures are used by the worker threads. Each
> worker thread updates its local histogram as it marks an object. These local
> histograms are merged at the conclusion of the marking phase under a mutex.
> The event is emitted outside a safepoint. Because (most) Shenandoah's marking
> is done concurrently, so is the object counting work.
>
> ### Performance
> The performance test were ran using the Extremem benchmark on a default and
> stress workload. (will edit this section to include data after average time
> and test for GenShen)
>
> #### Default workload:
> ObjectCountAfterGC disabled (master branch):
> `[807.216s][info][gc,stats ] Pause Init Mark (G) = 0.003 s
> (a = 264 us)`
> `[807.216s][info][gc,stats ] Pause Init Mark (N) = 0.001 s
> (a = 91 us)`
> `[807.216s][info][gc,stats ] Concurrent Mark Roots = 0.041 s
> (a = 4099 us)`
> `[807.216s][info][gc,stats ] Concurrent Marking = 1.660 s
> (a = 166035 us)`
> `[807.216s][info][gc,stats ] Pause Final Mark (G) = 0.004 s
> (a = 446 us) `
> `[807.216s][info][gc,stats ] Pause Final Mark (G) = 0.004 s
> (a = 446 us) `
> `[807.216s][info][gc,stats ] Pause Final Mark (N) = 0.004 s
> (a = 357 us)`
>
> ObjectCountAfterGC disabled (feature branch):
> `[807.104s][info][gc,stats ] Pause Init Mark (G) = 0.003 s
> (a = 302 us)`
> `[807.104s][info][gc,stats ] Pause Init Mark (N) = 0.001 s
> (a = 92 us) `
> `[807.104s][info][gc,stats ] Concurrent Mark Roots = 0.048 s
> (a = 4827 us)`
> `[807.104s][info][gc,stats ] Concurrent Marking = 1.666 s
> (a = 166638 us) `
> `[807.104s][info][gc,stats ] Pause Final Mark (G) = 0.006 s
> (a = 603 us)`
> `[807.104s][info][gc,stats ] Pause Final Mark (N) = 0.005 s
> (a = 516 us)`
>
> ObjectCountAfterGC enabled (feature branch)
> `[807.299s][info][gc,stats ] Pause Init Mark (G) = 0.002 s
> (a = 227 us)`
> `[807.299s][info][gc,stats ] Pause Init Mark (N) = 0.001 s
> (a = 89 us) `
> `[807.299s][info][gc,stats ] Concurrent Mark Roots = 0.053 s
> (a = 5279 us)`
> `[807.299s][info][gc,st...
Left a few superficial suggestions. My big question is what work remains for
this event to also work with the generational mode?
src/hotspot/share/gc/shared/gcTrace.hpp line 136:
> 134: void report_gc_reference_stats(const ReferenceProcessorStats& rp)
> const;
> 135: void report_object_count_after_gc(BoolObjectClosure* object_filter,
> WorkerThreads* workers) NOT_SERVICES_RETURN;
> 136: // Report object count by not performing a heap inspection. This
> method will
s/by not/without
src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 179:
> 177: // Use object counting closure if ObjectCount or ObjectCountAfterGC
> event is enabled.
> 178: const bool object_count_enabled =
> ObjectCountEventSender::should_send_event();
> 179: if (object_count_enabled &&
> !ShenandoahHeap::heap()->mode()->is_generational()) {
Can the generational mode support this? It can also perform a _global_
collection.
src/hotspot/share/gc/shenandoah/shenandoahControlThread.cpp line 130:
> 128: // Create the KlassInfoTable for Shenandoah only if JFR is enabled.
> 129: #if INCLUDE_JFR
> 130: KlassInfoTable cit(false);
Might consider a `CADR` helper class here (Constructor Acquires, Destructor
Releases). There are a lot of lines of code between the two `set_cit` calls
(and the second one isn't guarded by `INCLUDE_JFR`).
src/hotspot/share/memory/heapInspection.hpp line 33:
> 31: #include "oops/objArrayOop.hpp"
> 32: #include "oops/oop.hpp"
> 33: #include "runtime/mutex.hpp"
Does this need to be in the header?
src/hotspot/share/runtime/mutexLocker.hpp line 90:
> 88: extern Monitor* Notify_lock; // a lock used to
> synchronize the start-up of the vm
> 89: extern Mutex* ExceptionCache_lock; // a lock used to
> synchronize exception cache updates
> 90: extern Mutex* TableMerge_lock; // a lock used to
> synchronize merging a thread-local table into a global table
Might call this `ObjectCountMerge_lock` and describe its usage as `merging a
thread local object count`. There are many `Tables` in the JVM (SymbolTable,
InternedStrings, etc.).
test/jdk/jdk/jfr/event/gc/objectcount/TestObjectCountAfterGCEventWithShenandoah.java
line 11:
> 9: * & vm.opt.ExplicitGCInvokesConcurrent != true
> 10: * @library /test/lib /test/jdk
> 11: * @run main/othervm -XX:+UnlockExperimentalVMOptions
> -XX:-UseFastUnorderedTimeStamps -XX:+UseShenandoahGC -XX:MarkSweepDeadRatio=0
> -XX:-UseCompressedOops -XX:-UseCompressedClassPointers
> -XX:+IgnoreUnrecognizedVMOptions
> jdk.jfr.event.gc.objectcount.TestObjectCountAfterGCEventWithShenandoah
Are all of these flags necessary to run this test? Can we pare any unnecessary
options?
-------------
Changes requested by wkemper (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/26977#pullrequestreview-3178372422
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317339492
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317344826
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317348710
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317355707
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317358657
PR Review Comment: https://git.openjdk.org/jdk/pull/26977#discussion_r2317360739