Hi Martin,

Your change works Ok on arm32 with the minor correction. See the patch attached.

thanks,
Boris

On 16.07.2019 16:31, Doerr, Martin wrote:
Hi,

the current implementation of FastJNIAccessors ignores the flag -XX:+UseFastJNIAccessors 
when the JVMTI capability "can_post_field_access" is enabled.
This is an unnecessary restriction which makes field accesses (Get<Type>Field) 
from native code slower when a JVMTI agent is attached which enables this capability.
A better implementation would check at runtime if an agent actually wants to 
receive field access events.

Note that the bytecode interpreter already uses this better implementation by 
checking if field access watch events were requested 
(JvmtiExport::_field_access_count != 0).

I have implemented such a runtime check on all platforms which currently 
support FastJNIAccessors.

My new jtreg test runtime/jni/FastGetField/FastGetField.java contains a micro 
benchmark:
test-support/jtreg_test_hotspot_jtreg_runtime_jni_FastGetField/runtime/jni/FastGetField/FastGetField.jtr
shows the duration of 10000 iterations with and without UseFastJNIAccessors 
(JVMTI agent gets attached in both runs).
My Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz needed 4.7ms with FastJNIAccessors 
and 11.2ms without it.

Webrev:
http://cr.openjdk.java.net/~mdoerr/8227680_FastJNIAccessors/webrev.00/

We have run the test on 64 bit x86 platforms, SPARC and aarch64.
(FastJNIAccessors are not yet available on PPC64 and s390. I'll contribute them 
later.)
My webrev contains 32 bit implementations for x86 and arm, but completely 
untested. It'd be great if somebody could volunteer to review and test these 
platforms.

Please review.

Best regards,
Martin

--- a/src/hotspot/cpu/arm/jniFastGetField_arm.cpp	2019-07-26 13:29:34.569851539 +0300
+++ b/src/hotspot/cpu/arm/jniFastGetField_arm.cpp	2019-07-26 13:31:34.441884864 +0300
@@ -32,7 +32,7 @@
 
 #define __ masm->
 
-#define BUFFER_SIZE  96
+#define BUFFER_SIZE  120
 
 address JNI_FastGetField::generate_fast_get_int_field0(BasicType type) {
   const char* name = NULL;
@@ -114,7 +114,7 @@
 
   if (JvmtiExport::can_post_field_access()) {
     // Using barrier to order wrt. JVMTI check and load of result.
-    __ membar(Assembler::LoadLoad, Rtmp1);
+    __ membar(MacroAssembler::LoadLoad, Rtmp1);
 
     // Check to see if a field access watch has been set before we
     // take the fast path.
@@ -191,7 +191,7 @@
 
   if (JvmtiExport::can_post_field_access()) {
     // Order JVMTI check and load of result wrt. succeeding check.
-    __ membar(Assembler::LoadLoad, Rtmp2);
+    __ membar(MacroAssembler::LoadLoad, Rtmp2);
     __ ldr_s32(Rsafept_cnt2, Address(Rsafepoint_counter_addr));
   } else {
     // Address dependency restricts memory access ordering. It's cheaper than explicit LoadLoad barrier

Reply via email to