On Tue, 16 Jul 2024 12:36:08 GMT, Roman Kennke <rken...@openjdk.org> wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with 10 
>> additional commits since the last revision:
>> 
>>  - Remove try_read
>>  - Add explicit to single parameter constructors
>>  - Remove superfluous access specifier
>>  - Remove unused include
>>  - Update assert message OMCache::set_monitor
>>  - Fix indentation
>>  - Remove outdated comment LightweightSynchronizer::exit
>>  - Remove logStream include
>>  - Remove strange comment
>>  - Fix javaThread include
>
> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 674:
> 
>> 672: 
>> 673:       // Search for obj in cache.
>> 674:       bind(loop);
> 
> Same loop transformation would be possible here.

I tried the following (see diff below) and it shows about a 5-10% regression in 
most the `LockUnlock.testInflated*` micros. Also tried with just `num_unrolled 
= 1` saw the same regression.  Maybe there was some other pattern you were 
thinking of. There are probably architecture and platform differences. This can 
and should probably be explored in a followup PR.



diff --git a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp 
b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
index 5dbfdbc225d..4e6621cfece 100644
--- a/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
+++ b/src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp
@@ -663,25 +663,28 @@ void C2_MacroAssembler::fast_lock_lightweight(Register 
obj, Register box, Regist
 
       const int num_unrolled = 2;
       for (int i = 0; i < num_unrolled; i++) {
-        cmpptr(obj, Address(t));
-        jccb(Assembler::equal, monitor_found);
-        increment(t, in_bytes(OMCache::oop_to_oop_difference()));
+        Label next;
+        cmpptr(obj, Address(t, OMCache::oop_to_oop_difference() * i));
+        jccb(Assembler::notEqual, next);
+        increment(t, in_bytes(OMCache::oop_to_oop_difference() * i));
+        jmpb(monitor_found);
+        bind(next);
       }
+      increment(t, in_bytes(OMCache::oop_to_oop_difference() * (num_unrolled - 
1)));
 
       Label loop;
 
       // Search for obj in cache.
       bind(loop);
-
-      // Check for match.
-      cmpptr(obj, Address(t));
-      jccb(Assembler::equal, monitor_found);
-
+      // Advance.
+      increment(t, in_bytes(OMCache::oop_to_oop_difference()));
       // Search until null encountered, guaranteed _null_sentinel at end.
       cmpptr(Address(t), 1);
       jcc(Assembler::below, slow_path); // 0 check, but with ZF=0 when *t == 0
-      increment(t, in_bytes(OMCache::oop_to_oop_difference()));
-      jmpb(loop);
+
+      // Check for match.
+      cmpptr(obj, Address(t));
+      jccb(Assembler::notEqual, loop);
 
       // Cache hit.
       bind(monitor_found);

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/20067#discussion_r1715249312

Reply via email to