Re: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v98]

Xiaolong Peng Fri, 01 May 2026 08:07:57 -0700

> - [x] I confirm that I make this contribution in accordance with the [OpenJDK 
> Interim AI Policy](https://openjdk.org/legal/ai).
> 
> Shenandoah always allocates memory with heap lock, we have observed heavy 
> heap lock contention on memory allocation path in performance analysis of 
> some service in which we tried to adopt Shenandoah. This change is to propose 
> an optimization for the code path of memory allocation to improve heap lock 
> contention, along with the optimization, a better OOD is also done to 
> Shenandoah memory allocation to reuse the majority of the code:
> 
> * ShenandoahAllocator: base class of the allocators, most of the allocation 
> code is in this class.
> * ShenandoahMutatorAllocator: allocator for mutator, inherit from 
> ShenandoahAllocator, only override methods `alloc_start_index`, `verify`, 
> `_alloc_region_count` and  `_yield_to_safepoint` to customize the allocator 
> for mutator.
> * ShenandoahCollectorAllocator: allocator for collector allocation in 
> Collector partition, similar to ShenandoahMutatorAllocator, only few lines of 
> code to customize the allocator for Collector. 
> * ShenandoahOldCollectorAllocator:  allocator for mutator collector 
> allocation in OldCollector partition, it doesn't inherit the logic from 
> ShenandoahAllocator for now, the `allocate` method has been overridden to 
> delegate to `FreeSet::allocate_for_collector` due to the special allocation 
> considerations for `plab` in old gen. We will rewrite this part later and 
> move the code out of `FreeSet::allocate_for_collector`
> 
> I'm not expecting significant performance impact for most of the cases since 
> in most case the contention on heap lock it not high enough to cause 
> performance issue, but in some cases it may improve the latency/performance:
> 
> 1. Dacapo lusearch test on EC2 host with 96 CPU cores, p90 is improved from 
> 500+us to less than 150us, p99 from 1000+us to ~200us. 
> 
> java -XX:-TieredCompilation -XX:+AlwaysPreTouch -Xms31G -Xmx31G 
> -XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions 
> -XX:+UnlockDiagnosticVMOptions  -XX:-ShenandoahUncommit 
> -XX:ShenandoahGCMode=generational  -XX:+UseTLAB -jar 
> ~/tools/dacapo/dacapo-23.11-MR2-chopin.jar  -n 10 lusearch  | grep "metered 
> full smoothing"
> 
> 
> Openjdk TIP:
> 
> ===== DaCapo tail latency, metered full smoothing: 50% 241098 usec, 90% 
> 402356 usec, 99% 411065 usec, 99.9% 411763 usec, 99.99% 415531 usec, max 
> 428584 usec, measured over 524288 events =====
> ===== DaCapo tail latency, metered full smoothing: 50% 902 usec, 90% 3713 
> usec, 99% 5898 usec, 99.9% 6488 ...


Xiaolong Peng has updated the pull request incrementally with one additional 
commit since the last revision:

  fix: new jtreg tests miss -XX:+UnlockDiagnosticVMOptions

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/26171/files
  - new: https://git.openjdk.org/jdk/pull/26171/files/76ad61c9..ff321f37

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=97
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=26171&range=96-97

  Stats: 14 lines in 2 files changed: 0 ins; 0 del; 14 mod
  Patch: https://git.openjdk.org/jdk/pull/26171.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/26171/head:pull/26171

PR: https://git.openjdk.org/jdk/pull/26171

Re: RFR: 8361099: Shenandoah: Improve heap lock contention by using CAS for memory allocation [v98]

Reply via email to