> - [x] I confirm that I make this contribution in accordance with the [OpenJDK > Interim AI Policy](https://openjdk.org/legal/ai). > > Shenandoah always allocates memory with heap lock, we have observed heavy > heap lock contention on memory allocation path in performance analysis of > some service in which we tried to adopt Shenandoah. This change is to propose > an optimization for the code path of memory allocation to improve heap lock > contention, along with the optimization, a better OOD is also done to > Shenandoah memory allocation to reuse the majority of the code: > > * ShenandoahAllocator: base class of the allocators, most of the allocation > code is in this class. > * ShenandoahMutatorAllocator: allocator for mutator, inherit from > ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, > `_alloc_region_count` and `_yield_to_safepoint` to customize the allocator > for mutator. > * ShenandoahCollectorAllocator: allocator for collector allocation in > Collector partition, similar to ShenandoahMutatorAllocator, only few lines of > code to customize the allocator for Collector. > * ShenandoahOldCollectorAllocator: allocator for mutator collector > allocation in OldCollector partition, it doesn't inherit the logic from > ShenandoahAllocator for now, the `allocate` method has been overridden to > delegate to `FreeSet::allocate_for_collector` due to the special allocation > considerations for `plab` in old gen. We will rewrite this part later and > move the code out of `FreeSet::allocate_for_collector` > > I'm not expecting significant performance impact for most of the cases since > in most case the contention on heap lock it not high enough to cause > performance issue, but in some cases it may improve the latency/performance: > > 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from > 500+us to less than 150us, p99 from 1000+us to ~200us. > > java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G > -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions > -XX:+UnlockDiagnosticVMOptions -XX:-ShenandoahUncommit > -XX:ShenandoahGCMode=generational -XX:+UseTLAB -jar > ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar -n 10 lusearch | grep "metered > full smoothing" > > > Openjdk TIP: > > ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% > 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max > 428584 usec, measured over 524288 events ===== > ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 > usec, 99% 5898 usec, 99.9% 6488 ...
Xiaolong Peng has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 386 commits: - Merge branch 'openjdk:master' into cas-alloc-1 - Document _collector_allocator_reserved ordering contract - Tame TestCASAllocContention: fewer threads, shorter run, smaller retention - Comment: pair reserve-time inflation with read-time compensation Cross-reference ShenandoahFreeSet::reserve_alloc_regions_internal at every 'bytes_allocated - mutator_allocator_remaining' subtraction site. No behavior change. - Include mutator allocator remaining bytes in unsafe_max_tlab_alloc - Reserve mutator alloc regions after abbreviated degenerated GC - fix: new jtreg tests miss -XX:+UnlockDiagnosticVMOptions - fix: sync _top before CAS in unset_active_alloc_region - fix: sync _top once on CAS success and reset_age after the loop in unset_active_alloc_region - Add jtreg tests for CAS allocator flag combinations and contention - ... and 376 more: https://git.openjdk.org/jdk/compare/ebb3d688...ce68d1fa ------------- Changes: https://git.openjdk.org/jdk/pull/26171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=103 Stats: 2332 lines in 36 files changed: 1930 ins; 239 del; 163 mod Patch: https://git.openjdk.org/jdk/pull/26171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171 PR: https://git.openjdk.org/jdk/pull/26171
