Re: RFR: JDK-8307314: Implementation: Generational Shenandoah (Experimental) [v5]

Kelvin Nilsen Tue, 06 Jun 2023 06:49:20 -0700

On Sun, 4 Jun 2023 21:39:58 GMT, Kelvin Nilsen <kdnil...@openjdk.org> wrote:


>> OpenJDK Colleagues:
>> 
>> Please review this proposed integration of Generational mode for Shenandoah 
>> GC under https://bugs.openjdk.org/browse/JDK-8307314.
>> 
>> Generational mode of Shenandoah is enabled by adding 
>> `-XX:+UnlockExperimentalVMOptions -XX:ShenandoahGCMode=generational` to a 
>> command line that already specifies ` -XX:+UseShenandoahGC`.  The 
>> implementation automatically adjusts the sizes of old generation and young 
>> generation to efficiently utilize the entire heap capacity.  Generational 
>> mode of Shenandoah resembles G1 in the following regards:
>> 
>> 1. Old-generation marking runs concurrently during the time that multiple 
>> young generation collections run to completion.
>> 2. After old-generation marking completes, we perform a sequence of mixed 
>> collections.  Each mixed collection combines collection of young generation 
>> with evacuation of a portion of the old-generation regions identified for 
>> collection based on old-generation marking information.
>> 3. Unlike G1, young-generation collections and evacuations are entirely 
>> concurrent, as with single-generation Shenandoah.
>> 4. As with single-generation Shenandoah, there is no explicit notion of eden 
>> and survivor space within the young generation.  In practice, regions that 
>> were most recently allocated tend to have large amounts of garbage and these 
>> regions tend to be collected with very little effort.  Young-generation 
>> objects that survive garbage collection tend to accumulate in regions that 
>> hold survivor objects.  These regions tend to have smaller amounts of 
>> garbage, and are less likely to be collected.  If they survive a sufficient 
>> number of young-generation collections, the “survivor” regions are promoted 
>> into the old generation.
>> 
>> We expect to refine heuristics as we gain experience with more production 
>> workloads.  In the future, we plan to remove the “experimental” qualifier 
>> from generational mode, at which time we expect that generational mode will 
>> become the default mode for Shenandoah.
>> 
>> **Testing**: We continuously run jtreg tiers 1-4 + hotspot_gc_shenandoah, 
>> gcstress, jck compiler, jck runtime, Dacapo, SpecJBB, SpecVM, Extremem, 
>> HyperAlloc, and multiple AWS production workload simulators. We test on 
>> Linux x64 and aarch64, Alpine x64 and aarch64, macOS x64 and aarch64, and 
>> Windows x64.
>
> Kelvin Nilsen has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remove three asserts making comparisons between atomic volatile variables
>   
>   Though changes to the volatile variables are individually protected by
>   Atomic load and store operations, these asserts were not assuring
>   atomic access to multiple volatile variables, each of which could be
>   modified independently of the others.  The asserts were therefore not
>   trustworthy, as has been confirmed by more extensive testing.

Thanks Thomas for the feedback:

These proposed changes represent improvements to both Generational and 
Non-generational modes of operation.  We can revert if that is desired, or we 
can specialize Generational versions of these parameters so that they can have 
different values in different modes, but here is a bit of background.  We've 
done considerable testing on a variety of synthetic workloads and some limited 
testing on production workloads.  As we move towards upstream integration, we 
expect this will help us gain exposure to more production workloads.  The 
following changes were based on results of this testing:

* Decrease ShenandoahLearningSteps to 5 (from 10): For some workloads, we 
observed that there were "way too many" learning cycles being triggered.  We 
also observed that the learning achieved during learning cycles was not as 
trustworthy as the learning achieved during actual operation, because these 
learning cycles typically trigger during initialization phases which are not 
representative of real-world operation and because they usually trigger so 
prematurely that there has not been enough time for allocated objects to die 
before we garbage collect.
* Change ShenandoahImmediateThreshold to 70 from 90: We discovered during 
experiments with settings on certain real production workloads that reducing 
the threshold for abbreviated cycles significantly improved throughput, reduced 
degenerated cycles, and reduced high percentile end-to-end latency on the 
relevant services.  These experiments were based on single-generation 
Shenandoah.  We saw no negative impact of making this change on our various 
workloads.
 * I'll let @earthling-amzn comment on the change to 
ShenandoahAdaptiveDecayFactor.  My recollection is that this change was also 
motivated by experience with single-generation Shenandoah on a real production 
workload.
 * The change of ShenandoahFullGCThreshold from 3 to 64 was motivated by some 
observations with specjbb performance as it ratchets up the workload to 
determine MaxJOPS.  We observed that for both single-generation Shenandoah and 
generational Shenandoah, the typical behavior was that a single Full GC trigger 
causes an "infinite" sequence of Full GC, even though we may have only lost the 
concurrent GC race by a small amount.  This is because (1) Full GC discards all 
the incremental work of the concurrent GC that was just interrupted, (2) STW 
Full GC creates a situation in which pent up demand for execution and 
allocation accumulates during the STW pause so there's a huge demand for 
allocation immediately following the end of Full GC, (3) The concurrent GC that 
triggers immediately after Full GC completes is "destined" to fail because no 
garbage has been introduced since Full GC finished and since SATB does not 
collect floating garbage that accumulates after the start of concurrent GC a
 nd since the allocation spike is so high immediately following the Full GC 
(e.g. 11GB/s instead of 3GB/s normally).  This change allows a sequence of 
degenerated GCs to manage slow evolution and sudden bursts of allocation rate 
much more effectively than the original code.  This is accompanied by a change 
in how we detect and throw OOM.  We wait for at least one Full GC but we don't 
force ShenandoahFullGCThreshold allocation failures before thowing OOM.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/14185#issuecomment-1578800487

Re: RFR: JDK-8307314: Implementation: Generational Shenandoah (Experimental) [v5]

Reply via email to