Re: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v104]

Xiaolong Peng Mon, 04 May 2026 16:04:03 -0700

> - [x] I confirm that I make this contribution in accordance with the [OpenJDK 
> Interim AI Policy](https://openjdk.org/legal/ai).
> 
> Shenandoah always allocates memory with heap lock, we have observed heavy 
> heap lock contention on memory allocation path in performance analysis of 
> some service in which we tried to adopt Shenandoah. This change is to propose 
> an optimization for the code path of memory allocation to improve heap lock 
> contention, along with the optimization, a better OOD is also done to 
> Shenandoah memory allocation to reuse the majority of the code:
> 
> * ShenandoahAllocator: base class of the allocators, most of the allocation 
> code is in this class.
> * ShenandoahMutatorAllocator: allocator for mutator, inherit from 
> ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, 
> `_alloc_region_count` and  `_yield_to_safepoint` to customize the allocator 
> for mutator.
> * ShenandoahCollectorAllocator: allocator for collector allocation in 
> Collector partition, similar to ShenandoahMutatorAllocator, only few lines of 
> code to customize the allocator for Collector. 
> * ShenandoahOldCollectorAllocator:  allocator for mutator collector 
> allocation in OldCollector partition, it doesn't inherit the logic from 
> ShenandoahAllocator for now, the `allocate` method has been overridden to 
> delegate to `FreeSet::allocate_for_collector` due to the special allocation 
> considerations for `plab` in old gen. We will rewrite this part later and 
> move the code out of `FreeSet::allocate_for_collector`
> 
> I'm not expecting significant performance impact for most of the cases since 
> in most case the contention on heap lock it not high enough to cause 
> performance issue, but in some cases it may improve the latency/performance:
> 
> 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 
> 500+us to less than 150us, p99 from 1000+us to ~200us. 
> 
> java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G 
> -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions 
> -XX:+UnlockDiagnosticVMOptions  -XX:-ShenandoahUncommit 
> -XX:ShenandoahGCMode=generational  -XX:+UseTLAB -jar 
> ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar  -n 10 lusearch  | grep "metered 
> full smoothing"
> 
> 
> Openjdk TIP:
> 
> ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 
> 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 
> 428584 usec, measured over 524288 events =====
> ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 
> usec, 99% 5898 usec, 99.9% 6488 ...


Xiaolong Peng has updated the pull request with a new target base due to a 
merge or a rebase. The pull request now contains 386 commits:

 - Merge branch 'openjdk:master' into cas-alloc-1
 - Document _collector_allocator_reserved ordering contract
 - Tame TestCASAllocContention: fewer threads, shorter run, smaller retention
 - Comment: pair reserve-time inflation with read-time compensation
   
   Cross-reference ShenandoahFreeSet::reserve_alloc_regions_internal at
   every 'bytes_allocated - mutator_allocator_remaining' subtraction site.
   No behavior change.
 - Include mutator allocator remaining bytes in unsafe_max_tlab_alloc
 - Reserve mutator alloc regions after abbreviated degenerated GC
 - fix: new jtreg tests miss -XX:+UnlockDiagnosticVMOptions
 - fix: sync _top before CAS in unset_active_alloc_region
 - fix: sync _top once on CAS success and reset_age after the loop in 
unset_active_alloc_region
 - Add jtreg tests for CAS allocator flag combinations and contention
 - ... and 376 more: https://git.openjdk.org/jdk/compare/ebb3d688...ce68d1fa

-------------

Changes: https://git.openjdk.org/jdk/pull/26171/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=103
  Stats: 2332 lines in 36 files changed: 1930 ins; 239 del; 163 mod
  Patch: https://git.openjdk.org/jdk/pull/26171.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171

PR: https://git.openjdk.org/jdk/pull/26171

Re: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v104]

Reply via email to