On Mon, 4 May 2026 23:03:06 GMT, Xiaolong Peng <[email protected]> wrote:
>> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). >> >> Shenandoah always allocates memory with heap lock, we have observed heavy >> heap lock contention on memory allocation path in performance analysis of >> some service in which we tried to adopt Shenandoah. This change is to >> propose an optimization for the code path of memory allocation to improve >> heap lock contention, along with the optimization, a better OOD is also done >> to Shenandoah memory allocation to reuse the majority of the code: >> >> * ShenandoahAllocator: base class of the allocators, most of the allocation >> code is in this class. >> * ShenandoahMutatorAllocator: allocator for mutator, inherit from >> ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, >> `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator >> for mutator. >> * ShenandoahCollectorAllocator: allocator for collector allocation in >> Collector partition, similar to ShenandoahMutatorAllocator, only few lines >> of code to customize the allocator for Collector. >> * ShenandoahOldCollectorAllocator: allocator for mutator collector >> allocation in OldCollector partition, it doesn't inherit the logic from >> ShenandoahAllocator for now, the `allocate` method has been overridden to >> delegate to `FreeSet::allocate_for_collector` due to the special allocation >> considerations for `plab` in old gen. We will rewrite this part later and >> move the code out of `FreeSet::allocate_for_collector` >> >> I'm not expecting significant performance impact for most of the cases since >> in most case the contention on heap lock it not high enough to cause >> performance issue, but in some cases it may improve the latency/performance: >> >> 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from >> 500+us to less than 150us, p99 from 1000+us to ~200us. >> >> java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G >> -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions >> -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit >> -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar >> ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered >> full smoothing" >> >> >> Openjdk TIP: >> >> ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% >> 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max >> 428584 usec, measured over 524288 events ===== >> ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 9... > > Xiaolong Peng has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 386 commits: > > - Merge branch 'openjdk:master' into cas-alloc-1 > - Document _collector_allocator_reserved ordering contract > - Tame TestCASAllocContention: fewer threads, shorter run, smaller retention > - Comment: pair reserve-time inflation with read-time compensation > > Cross-reference ShenandoahFreeSet::reserve_alloc_regions_internal at > every 'bytes_allocated - mutator_allocator_remaining' subtraction site. > No behavior change. > - Include mutator allocator remaining bytes in unsafe_max_tlab_alloc > - Reserve mutator alloc regions after abbreviated degenerated GC > - fix: new jtreg tests miss -XX:+UnlockDiagnosticVMOptions > - fix: sync _top before CAS in unset_active_alloc_region > - fix: sync _top once on CAS success and reset_age after the loop in > unset_active_alloc_region > - Add jtreg tests for CAS allocator flag combinations and contention > - ... and 376 more: https://git.openjdk.org/jdk/compare/ebb3d688...ce68d1fa No, this requires significantly more work. This is way too hard to review. You need to drastically and mercilessly simplify. I believe it would be significantly simpler if you can introduce "allocator" interface separately without any semantic changes first. _Then_ extend that allocator with CAS allocations support. Additionally, I think it clashes with https://github.com/openjdk/jdk/pull/31047, and that one probably needs to go in first. src/hotspot/share/gc/shared/plab.cpp line 40: > 38: } > 39: > 40: size_t PLAB::min_byte_size() { There is a single use. Inline the implementation to avoid extra shared code changes. src/hotspot/share/gc/shared/plab.cpp line 48: > 46: } > 47: > 48: size_t PLAB::max_byte_size() { I don't think this is used anywhere. src/hotspot/share/gc/shenandoah/heuristics/shenandoahAdaptiveHeuristics.cpp line 421: > 419: size_t allocated_bytes_since_last_sample = 0; > 420: > 421: { I think this clashes with https://github.com/openjdk/jdk/pull/31047, which simplifies this path. src/hotspot/share/gc/shenandoah/shenandoahAffiliation.hpp line 28: > 26: #define SHARE_GC_SHENANDOAH_SHENANDOAHAFFILIATION_HPP > 27: > 28: #include "utilities/debug.hpp" Do you need it? If you don't use anything from the header in this file, move the include where it is actually needed. ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/26171#pullrequestreview-4282998779 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r3235410521 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r3235405129 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r3235449980 PR Review Comment: https://git.openjdk.org/jdk/pull/26171#discussion_r3235419377
