> First, credit to @TobiHartmann for the diagnostic, and a lot of the solution.
> 
> # Diagnostic
> 
> According to [Strict Field Initialization 
> JEP](https://openjdk.org/jeps/8350458), when a strict field is being 
> initialized, it is not quite immutable, but observally immutable: at first, 
> the field can be only set (during the early larval phase), then it can be 
> only read (late larval or initialized phase), so the last set is the actual 
> value one can ever observe.
> 
> The interesting part is that in early larval phase, a field can be subject to 
> some side effects. When applied to a value object, that means that until it 
> reaches the unrestricted state, it is not yet immutable. While being not 
> theoretically necessary, avoiding scalarization and keeping the value object 
> behind a reference is a convenient way to make sure that side effects are 
> correctly applied. This strategy means that we shouldn't scalarized before 
> reaching the unrestricted state. Normally, in C2, finding out what is early 
> larval or not is the job of bytecode parsing, but in OSR compilation, 
> everything about the StartOSR is not parsed, and thus some objects are 
> soundly assumed that they might be larval, when they actually aren't. In the 
> reported example, that leads to drastic performance difference between OSR 
> and non OSR compilation: the second one is able to eliminate allocations 
> since it knows more precisely when the value object can be scalarized.
> 
> In the original example:
> 
> public value class MyNumber {
>     private long d0;
>     private MyNumber(long d0) { this.d0 = d0; }
>     public MyNumber add(long v) { return new MyNumber(d0 + v); }
> 
>     private static void loop() {
>         MyNumber dec = new MyNumber(123);
>         for (int i = 0; i < 1_000_000_000; ++i) {
>             dec = dec.add(i);
>         }
>     }
> 
>     public static void main(String[] args) {
>         for (int i = 0; i < 10; ++i) {
>             loop();
>         }
>     }
> }
> 
> OSR happens in the loop in `loop`, but here `dec` is not detected to be 
> unrestricted (so immutable, so scalarizable), so the allocation in inlined 
> `add` still needs to happen because we need the buffer for the new `dec`. The 
> first iteration traps at the exit of the loop (unstable if), OSR happens 
> again, followed by a non-OSR compilation, finding correctly that `dec` can be 
> scalarized in the loop, making the third iteration fast.
> 
> # Solution
> 
> Overall, the solution requires to improve our detection of early larval 
> values. Since we keep parsing as-is, let's do that in `ciTypeFlow`. Th...

Marc Chevalier has updated the pull request incrementally with two additional 
commits since the last revision:

 - Rephrase summary
 - +copyright

-------------

Changes:
  - all: https://git.openjdk.org/valhalla/pull/1531/files
  - new: https://git.openjdk.org/valhalla/pull/1531/files/acce992f..183dfe97

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=valhalla&pr=1531&range=01
 - incr: https://webrevs.openjdk.org/?repo=valhalla&pr=1531&range=00-01

  Stats: 28 lines in 2 files changed: 25 ins; 0 del; 3 mod
  Patch: https://git.openjdk.org/valhalla/pull/1531.diff
  Fetch: git fetch https://git.openjdk.org/valhalla.git pull/1531/head:pull/1531

PR: https://git.openjdk.org/valhalla/pull/1531

Reply via email to