On Wed, 27 Oct 2021 11:53:47 GMT, Aleksey Shipilev <sh...@openjdk.org> wrote:
> `Unsafe.storeStoreFence` currently delegates to stronger `Unsafe.storeFence`. > We can teach compilers to map this directly to already existing rules that > handle `MemBarStoreStore`. Like explicit `LoadFence`/`StoreFence`, we > introduce the special node to differentiate explicit fence and implicit > store-store barriers. `storeStoreFence` is usually used to simulate safe > `final`-field like constructions in special JDK classes, like > `ConstantCallSite` and friends. > > Motivational performance difference on benchmarks from JDK-8276054 on ARM32 > (Raspberry Pi 4): > > > Benchmark Mode Cnt Score Error Units > Multiple.plain avgt 3 2.669 ± 0.004 ns/op > Multiple.release avgt 3 16.688 ± 0.057 ns/op > Multiple.storeStore avgt 3 14.021 ± 0.144 ns/op // Better > > MultipleWithLoads.plain avgt 3 4.672 ± 0.053 ns/op > MultipleWithLoads.release avgt 3 16.689 ± 0.044 ns/op > MultipleWithLoads.storeStore avgt 3 14.012 ± 0.010 ns/op // Better > > MultipleWithStores.plain avgt 3 14.687 ± 0.009 ns/op > MultipleWithStores.release avgt 3 45.393 ± 0.192 ns/op > MultipleWithStores.storeStore avgt 3 38.048 ± 0.033 ns/op // Better > > Publishing.plain avgt 3 27.079 ± 0.201 ns/op > Publishing.release avgt 3 27.088 ± 0.241 ns/op > Publishing.storeStore avgt 3 27.009 ± 0.259 ns/op // Within > error, hidden by allocation > > Single.plain avgt 3 2.670 ± 0.002 ns/op > Single.releaseFence avgt 3 6.675 ± 0.001 ns/op > Single.storeStoreFence avgt 3 8.012 ± 0.027 ns/op // Worse, > seems to be ARM32 implementation artifact > > > The same thing on AArch64 (Raspberry Pi 3): > > > Benchmark Mode Cnt Score Error Units > > Multiple.plain avgt 3 5.914 ± 0.115 ns/op > Multiple.release avgt 3 10.149 ± 0.059 ns/op > Multiple.storeStore avgt 3 6.757 ± 0.138 ns/op // Better > > MultipleWithLoads.plain avgt 3 11.849 ± 0.331 ns/op > MultipleWithLoads.release avgt 3 35.565 ± 1.144 ns/op > MultipleWithLoads.storeStore avgt 3 19.441 ± 0.471 ns/op // Better > > MultipleWithStores.plain avgt 3 5.920 ± 0.213 ns/op > MultipleWithStores.release avgt 3 20.286 ± 0.347 ns/op > MultipleWithStores.storeStore avgt 3 12.686 ± 0.230 ns/op // Better > > Publishing.plain avgt 3 22.261 ± 1.630 ns/op > Publishing.release avgt 3 22.269 ± 0.576 ns/op > Publishing.storeStore avgt 3 17.464 ± 0.397 ns/op // Better > > Single.plain avgt 3 5.916 ± 0.063 ns/op > Single.release avgt 3 10.148 ± 0.401 ns/op > Single.storeStore avgt 3 6.767 ± 0.164 ns/op // Better > > > As expected, this does not affect x86_64 at all, because both `release` and > `storeStore` are effectively no-ops, only affecting compiler optimizations: > > > Benchmark Mode Cnt Score Error Units > > Multiple.plain avgt 3 0.406 ± 0.002 ns/op > Multiple.release avgt 3 0.409 ± 0.018 ns/op > Multiple.storeStore avgt 3 0.406 ± 0.001 ns/op > > MultipleWithLoads.plain avgt 3 4.328 ± 0.006 ns/op > MultipleWithLoads.release avgt 3 4.600 ± 0.014 ns/op > MultipleWithLoads.storeStore avgt 3 4.602 ± 0.006 ns/op > > MultipleWithStores.plain avgt 3 0.812 ± 0.001 ns/op > MultipleWithStores.release avgt 3 0.812 ± 0.002 ns/op > MultipleWithStores.storeStore avgt 3 0.812 ± 0.002 ns/op > > Publishing.plain avgt 3 6.370 ± 0.059 ns/op > Publishing.release avgt 3 6.358 ± 0.436 ns/op > Publishing.storeStore avgt 3 6.367 ± 0.054 ns/op > > Single.plain avgt 3 0.407 ± 0.039 ns/op > Single.releaseFence avgt 3 0.406 ± 0.001 ns/op > Single.storeStoreFence avgt 3 0.406 ± 0.001 ns/op > > > Additional testing: > - [x] Linux x86_64 fastdebug `tier1` > - [x] Linux AArch64 fastdebug `tier1` > - [x] Linux x86_64 Fences benchmark > - [x] Linux AArch64 Fences benchmark > - [x] Linux ARM32 Fences benchmark > - [x] Linux AArch64 jcstress `quick` run This pull request has now been integrated. Changeset: b7a06be9 Author: Aleksey Shipilev <sh...@openjdk.org> URL: https://git.openjdk.java.net/jdk/commit/b7a06be98d3057dac4adbb7f4071ac62cf88fe52 Stats: 38 lines in 16 files changed: 32 ins; 5 del; 1 mod 8252990: Intrinsify Unsafe.storeStoreFence Reviewed-by: dholmes, thartmann, whuang ------------- PR: https://git.openjdk.java.net/jdk/pull/6136