Hi Jiangli,
On 29/05/2020 9:02 am, Jiangli Zhou wrote:
(Looping in serviceability-dev@openjdk.java.net ...)
Hi David and Ioi,
On Wed, May 27, 2020 at 11:15 PM David Holmes <david.hol...@oracle.com> wrote:
Hi Jiangli,
On 28/05/2020 11:35 am, Ioi Lam wrote:
On 5/27/20 6:17 PM, Jiangli Zhou wrote:
On Wed, May 27, 2020 at 1:56 PM Ioi Lam <ioi....@oracle.com> wrote:
On 5/26/20 6:21 PM, Jiangli Zhou wrote:
Focusing on the link state for archived classes in this thread, I
updated the webrev to only set archived boot classes to 'linked' state
at restore time. More investigations can be done for archived classes
for other builtin loaders.
https://bugs.openjdk.java.net/browse/JDK-8232222
http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/
Please let me know if there is any additional concerns to the change.
Best regards,
Jiangli
Hi Jiangli,
I think the change is fine. I am wondering if this
2530 if (!BytecodeVerificationLocal &&
2531 loader_data->is_the_null_class_loader_data()) {
2532 _init_state = linked;
2533 }
can be changed to
if (!BytecodeVerificationLocal &&
loader_data->is_the_null_class_loader_data() &&
!JvmtiExport::should_post_class_prepare())
That way, there's no need to change systemDictionary.cpp.
I was going to take the suggestion, but realized that it would add
unnecessary complications for archived boot classes with class
pre-initialization support. Some agents may set
JvmtiExport::should_post_class_prepare(). It's worthwhile to support
class pre-init uniformly for archived boot classes with
JvmtiExport::should_post_class_prepare() enabled or disabled.
This would introduce behavioral changes when JVMTI is enabled:
+ The order of JvmtiExport::post_class_prepare is different than before
+ JvmtiExport::post_class_prepare may be called for a class that was not
called before (if the class is never linked during run time)
+ JvmtiExport::post_class_prepare was called inside the init_lock, now
it's called outside of the init_lock
I have to say I share Ioi's concerns here. This change will impact JVM
TI agents in a way we can't be sure of. From a specification perspective
I think we are fine as linking can be lazy or eager, so there's no
implied order either. But this would be a behavioural change that will
be observable by agents. (I'm less concerned about the init_lock
situation as it seems potentially buggy to me to call out to an agent
with the init_lock held in the first place! I find it hard to imagine an
agent only working correctly if the init_lock is held.)
Totally agree that we need to be very careful here (that's also part
of the reason why I separated this into an individual RFE for the
dedicated discussion). David, thanks for the analysis from the spec
perspective! Agreed with the init_lock comment also. In the future, I
think we can even get rid of the needs for init_lock completely for
some of the pre-initialized classes.
This change has gone through extensive testing since the later part of
last year and has been in use (with the default CDS) with agents that
do post_class_prepare. Hopefully that would ease some of the concerns.
That is good to know, but that is just one sample of a set of agents.
This would need a CSR request and involvement of the serviceabilty folk,
to work through any potential issues.
I've looped in serviceability-dev@openjdk.java.net for this
discussion. Chris or Serguei could you please take a look of the
change, http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/,
specifically the JvmtiExport::post_class_prepare change in
systemDictionary.cpp.
Filing a CSR request sounds good to me. The CSR looks after source,
binary, and behavioral compatibility. From a behavior point of view,
the change most likely does not cause any visible effects to a JVMTI
agent (based on what's observed in testing and usages). What should be
included in the CSR?
The CSR request should explain the behavioural change that will be
observable by agents, and all of the potential compatibility issues that
might arise from that - pointing out of course that as the spec (JVMS
5.4**) allows for eager or lazy linking, agents shouldn't be relying on
the exact timing or order of events.
** I note this section has some additional constraints regarding
dynamically computed constants that might also come into play with this
pre-linking for CDS classes.
Cheers,
David
-----
Ioi's suggestion avoids this problem, but, as you note, at the expense
of disabling this optimisation if an agent is attached and wants class
prepare events.
Right, if we handle that case conditionally, we would alway need to
store the cached static field values separately since the dump time
cannot foresee if the runtime can set boot classes in 'linked' state
(and 'fully_initialized' state with the planned changes) at restore
time. As a result, we need to handle all pre-initialized static fields
like what we are doing today, which is storing them in the archived
class_info_records then installing them to the related fields at
runtime. That causes both unwanted memory and CPU overhead at runtime.
I also updated the webrev.02 in place with typo fixes. Thanks!
Best regards,
Jiangli
Thanks,
David
Thanks
- Ioi
BTW, I was wondering where the performance came from, so I wrote an
investigative patch:
diff -r 0702191777c9 src/hotspot/share/oops/instanceKlass.cpp
--- a/src/hotspot/share/oops/instanceKlass.cpp Thu May 21 15:56:27
2020 -0700
+++ b/src/hotspot/share/oops/instanceKlass.cpp Wed May 27 10:48:57
2020 -0700
@@ -866,6 +866,13 @@
return true;
}
+ if (UseSharedSpaces && !BytecodeVerificationLocal &&
is_shared_boot_class()) {
+ Handle h_init_lock(THREAD, init_lock());
+ ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL);
+ set_init_state(linked);
+ return true;
+ }
+
// trace only the link time for this klass that includes
// the verification time
PerfClassTraceTime vmtimer(ClassLoader::perf_class_link_time(),
Benchmarking results (smaller numbers are better):
(baseline vs your patch)
baseline jiangli baseline
jiangli
1: 58514375 57755638 (-758737) ----- 40.266
40.135 (
-0.131) -
2: 58506426 57754623 (-751803) ----- 40.367
39.417 (
-0.950) -----
3: 58498554 57759735 (-738819) ----- 40.513
39.970 (
-0.543) ---
4: 58491265 57751296 (-739969) ----- 40.439
40.268 (
-0.171) -
5: 58500588 57750975 (-749613) ----- 40.569
40.080 (
-0.489) --
6: 58497015 57744418 (-752597) ----- 41.097
40.147 (
-0.950) -----
7: 58494335 57749909 (-744426) ----- 39.983 40.214
( 0.231) +
8: 58500401 57750305 (-750096) ----- 40.235 40.417
( 0.182) +
9: 58490728 57767463 (-723265) ----- 40.354
39.928 (
-0.426) --
10: 58497858 57746557 (-751301) ----- 40.756
39.706 (
-1.050) -----
============================================================
58499154 57753091 (-746062) ----- 40.457
40.027 (
-0.430) --
instr delta = -746062 -1.2753%
time delta = -0.430 ms -1.0619%
(baseline vs my patch)
baseline ioi baseline ioi
1: 58503574 57821124 (-682450) ----- 40.554 39.783 (
-0.771) -----
2: 58499325 57819459 (-679866) ----- 40.092 40.325
( 0.233) ++
3: 58492362 57811978 (-680384) ----- 40.546
39.826 (
-0.720) -----
4: 58488655 57828878 (-659777) ----- 40.270 40.550
( 0.280) ++
5: 58501567 57830179 (-671388) ----- 40.382
40.145 (
-0.237) --
6: 58496552 57808774 (-687778) ----- 40.702
40.527 (
-0.175) -
7: 58482701 57808925 (-673776) ----- 40.268
39.849 (
-0.419) ---
8: 58493831 57807810 (-686021) ----- 40.396
39.940 (
-0.456) ---
9: 58489388 57811354 (-678034) ----- 40.575
40.078 (
-0.497) ---
10: 58482512 57795489 (-687023) ----- 40.084 40.247
( 0.163) +
============================================================
58493046 57814396 (-678650) ----- 40.386
40.126 (
-0.260) --
instr delta = -678650 -1.1602%
time delta = -0.260 ms -0.6445%
(your patch vs my patch)
jiangli ioi jiangli ioi
1: 57716711 57782622 ( 65911) ++++ 41.042 40.302 (
-0.740) -----
2: 57709666 57780196 ( 70530) ++++ 40.334 40.965 (
0.631) ++++
3: 57716074 57803315 ( 87241) +++++ 40.239 39.823 (
-0.416) ---
4: 57725152 57782719 ( 57567) +++ 40.430 39.805 (
-0.625) ----
5: 57719799 57787187 ( 67388) ++++ 40.138 40.003 (
-0.135) -
6: 57721922 57769193 ( 47271) +++ 40.324 40.207 (
-0.117) -
7: 57716438 57785212 ( 68774) ++++ 39.978 40.149 (
0.171) +
8: 57713834 57778797 ( 64963) ++++ 40.359 40.210 (
-0.149) -
9: 57711272 57786376 ( 75104) ++++ 40.575 40.724 (
0.149) +
10: 57711660 57780548 ( 68888) ++++ 40.291 40.091 (
-0.200) -
============================================================
57716252 57783615 ( 67363) ++++ 40.370 40.226 (
-0.144) -
instr delta = 67363 0.1167%
time delta = -0.144 ms -0.3560%
These numbers show that the majority of the time spent (678650
instructions) inside InstanceKlass::link_class_impl is spent from the
PerfClassTraceTime. Walking of the class hierarchy and taking the
h_init_lock only takes about 67363 instructions).
Due to this finding, I filed two more RFEs:
https://bugs.openjdk.java.net/browse/JDK-8246019
PerfClassTraceTime slows down VM start-up
It's related to JDK-8246020, and I've commented on the bug (see
JDK-8246020 comments). UsePerfData for perf data collection is common
in cloud usages. It's better to keep UsePerfData enabled by default.
https://bugs.openjdk.java.net/browse/JDK-8246015
Method::link_method is called twice for CDS methods
That was addressed as part of the initial change for JDK-8232222:
http://cr.openjdk.java.net/~jiangli/8232222/weberv.02/src/hotspot/share/oops/instanceKlass.cpp.frames.html
It's cleaner to handle it separately, so I removed it from the latest
version. I've assigned JDK-8246015 to myself and will address it
separately. Thanks for recording the separate bug.
Thanks!
Jiangli
Thanks
- Ioi