On Mon, 15 Jul 2024 00:50:30 GMT, Axel Boldt-Christmas <abold...@openjdk.org> wrote:
>> When inflating a monitor the `ObjectMonitor*` is written directly over the >> `markWord` and any overwritten data is displaced into a displaced >> `markWord`. This is problematic for concurrent GCs which needs extra care or >> looser semantics to use this displaced data. In Lilliput this data also >> contains the klass forcing this to be something that the GC has to take into >> account everywhere. >> >> This patch introduces an alternative solution where locking only uses the >> lock bits of the `markWord` and inflation does not override and displace the >> `markWord`. This is done by keeping associations between objects and >> `ObjectMonitor*` in an external hash table. Different caching techniques are >> used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is >> only supported in combination with the LM_LIGHTWEIGHT locking mode (the >> default). >> >> This patch has been evaluated to be performance neutral when >> `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` >> and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a >> `ObjectMonitorContentionMark` witness object has been introduced to the >> signatures. Which signals that the contentions reference counter is being >> held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact >> more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to >> create a more understandable and enforceable API. There are a handful of >> invariants / assumptions which are not always explicitly asserted which >> could be trivially abstracted and verified by the type system by using >> similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in >> the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to >> avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with 10 > additional commits since the last revision: > > - Remove try_read > - Add explicit to single parameter constructors > - Remove superfluous access specifier > - Remove unused include > - Update assert message OMCache::set_monitor > - Fix indentation > - Remove outdated comment LightweightSynchronizer::exit > - Remove logStream include > - Remove strange comment > - Fix javaThread include Another review pass by me. It looks to me like the cache lookup can be improved, see comments below. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 323: > 321: ldr(t1, Address(t3_t)); > 322: cmp(obj, t1); > 323: br(Assembler::EQ, monitor_found); I think the loop could be optimized a bit, if we start with the (cache_address) - 1 in t3, then increment t3 at the start of the loop, and let the success-case fall-through and only branch back to loop-start or to failure-path. Something like: bind(loop); increment(t3_t, in_bytes(OMCache::oop_to_oop_difference())); ldr(t1, Address(t3_t)); cbnz(t1, loop); cmp(obj, t1); br(Assembler::NE, loop); // Success Advantage would be that we have no forward-branch in the fast/expected case. CPU static branch prediction tends to not like that. I'm not sure if if makes a difference, though. Also, if you do that, then the unrolled loop also needs corresponding adjustment. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674: > 672: > 673: // Search for obj in cache. > 674: bind(loop); Same loop transformation would be possible here. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 776: > 774: movl(top, Address(thread, JavaThread::lock_stack_top_offset())); > 775: > 776: if (!UseObjectMonitorTable) { Why is the mark loaded here in the !UOMT case, but later in the +UOMT case? ------------- PR Review: https://git.openjdk.org/jdk/pull/20067#pullrequestreview-2179942149 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679210139 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679313050 PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1679315158