Hi Ioi,

On 2/06/2020 8:56 am, Ioi Lam wrote:


On 5/31/20 11:14 PM, David Holmes wrote:
Hi Jiangli,

On 29/05/2020 9:02 am, Jiangli Zhou wrote:
(Looping in serviceability-dev@openjdk.java.net ...)

Hi David and Ioi,

On Wed, May 27, 2020 at 11:15 PM David Holmes <david.hol...@oracle.com> wrote:

Hi Jiangli,

On 28/05/2020 11:35 am, Ioi Lam wrote:


On 5/27/20 6:17 PM, Jiangli Zhou wrote:
On Wed, May 27, 2020 at 1:56 PM Ioi Lam <ioi....@oracle.com> wrote:
On 5/26/20 6:21 PM, Jiangli Zhou wrote:

Focusing on the link state for archived classes in this thread, I
updated the webrev to only set archived boot classes to 'linked' state at restore time. More investigations can be done for archived classes
for other builtin loaders.

https://bugs.openjdk.java.net/browse/JDK-8232222
http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/

Please let me know if there is any additional concerns to the change.

Best regards,
Jiangli

Hi Jiangli,

I think the change is fine. I am wondering if this

2530   if (!BytecodeVerificationLocal &&
2531 loader_data->is_the_null_class_loader_data()) {
2532     _init_state = linked;
2533   }


can be changed to

          if (!BytecodeVerificationLocal &&
loader_data->is_the_null_class_loader_data() &&
              !JvmtiExport::should_post_class_prepare())

That way, there's no need to change systemDictionary.cpp.


I was going to take the suggestion, but realized that it would add
unnecessary complications for archived boot classes with class
pre-initialization support. Some agents may set
JvmtiExport::should_post_class_prepare(). It's worthwhile to support
class pre-init uniformly for archived boot classes with
JvmtiExport::should_post_class_prepare() enabled or disabled.

This would introduce behavioral changes when JVMTI is enabled:

+ The order of JvmtiExport::post_class_prepare is different than before + JvmtiExport::post_class_prepare may be called for a class that was not
called before (if the class is never linked during run time)
+ JvmtiExport::post_class_prepare was called inside the init_lock, now
it's called outside of the init_lock

I have to say I share Ioi's concerns here. This change will impact JVM
TI agents in a way we can't be sure of. From a specification perspective
I think we are fine as linking can be lazy or eager, so there's no
implied order either. But this would be a behavioural change that will
be observable by agents. (I'm less concerned about the init_lock
situation as it seems potentially buggy to me to call out to an agent
with the init_lock held in the first place! I find it hard to imagine an
agent only working correctly if the init_lock is held.)


Totally agree that we need to be very careful here (that's also part
of the reason why I separated this into an individual RFE for the
dedicated discussion). David, thanks for the analysis from the spec
perspective! Agreed with the init_lock comment also. In the future, I
think we can even get rid of the needs for init_lock completely for
some of the pre-initialized classes.

This change has gone through extensive testing since the later part of
last year and has been in use (with the default CDS) with agents that
do post_class_prepare. Hopefully that would ease some of the concerns.

That is good to know, but that is just one sample of a set of agents.

This would need a CSR request and involvement of the serviceabilty folk,
to work through any potential issues.


I've looped in serviceability-dev@openjdk.java.net for this
discussion. Chris or Serguei could you please take a look of the
change, http://cr.openjdk.java.net/~jiangli/8232222/webrev.02/,
specifically the JvmtiExport::post_class_prepare change in
systemDictionary.cpp.

Filing a CSR request sounds good to me. The CSR looks after source,
binary, and behavioral compatibility. From a behavior point of view,
the change most likely does not cause any visible effects to a JVMTI
agent (based on what's observed in testing and usages). What should be
included in the CSR?

The CSR request should explain the behavioural change that will be observable by agents, and all of the potential compatibility issues that might arise from that - pointing out of course that as the spec (JVMS 5.4**) allows for eager or lazy linking, agents shouldn't be relying on the exact timing or order of events.

** I note this section has some additional constraints regarding dynamically computed constants that might also come into play with this pre-linking for CDS classes.

I think the CSR should also include the benefit of doing this. It's not a lot of code change, but now we have to maintain two different code paths for post_class_prepare to be called.

The CSR is concerned only with the compatibility aspects of a change. The cost:benefit ratio is an engineering decision that should be discussed here in the RFR.

David
-----

JVMTI agents will typically introduce quite a bit of overhead in start-up, so a reduction in the range of 0.2~0.4ms seems a drop to the bucket. I'd rather keep the VM simple unless we have a strong reason to make it more complicated.

Thanks
- Ioi

Cheers,
David
-----

Ioi's suggestion avoids this problem, but, as you note, at the expense
of disabling this optimisation if an agent is attached and wants class
prepare events.


Right, if we handle that case conditionally, we would alway need to
store the cached static field values separately since the dump time
cannot foresee if the runtime can set boot classes in 'linked' state
(and 'fully_initialized' state with the planned changes) at restore
time. As a result, we need to handle all pre-initialized static fields
like what we are doing today, which is storing them in the archived
class_info_records then installing them to the related fields at
runtime. That causes both unwanted memory and CPU overhead at runtime.

I also updated the webrev.02 in place with typo fixes. Thanks!

Best regards,
Jiangli

Thanks,
David

Thanks
- Ioi


BTW, I was wondering where the performance came from, so I wrote an
investigative patch:

diff -r 0702191777c9 src/hotspot/share/oops/instanceKlass.cpp
--- a/src/hotspot/share/oops/instanceKlass.cpp    Thu May 21 15:56:27
2020 -0700
+++ b/src/hotspot/share/oops/instanceKlass.cpp    Wed May 27 10:48:57
2020 -0700
@@ -866,6 +866,13 @@
        return true;
      }

+  if (UseSharedSpaces && !BytecodeVerificationLocal &&
is_shared_boot_class()) {
+    Handle h_init_lock(THREAD, init_lock());
+    ObjectLocker ol(h_init_lock, THREAD, h_init_lock() != NULL);
+    set_init_state(linked);
+    return true;
+  }
+
      // trace only the link time for this klass that includes
      // the verification time
      PerfClassTraceTime vmtimer(ClassLoader::perf_class_link_time(),


Benchmarking results (smaller numbers are better):

(baseline vs your patch)

             baseline jiangli                           baseline
jiangli
      1:     58514375    57755638 (-758737) -----     40.266
40.135 (
-0.131)      -
      2:     58506426    57754623 (-751803) -----     40.367
39.417 (
-0.950)      -----
      3:     58498554    57759735 (-738819) -----     40.513
39.970 (
-0.543)      ---
      4:     58491265    57751296 (-739969) -----     40.439
40.268 (
-0.171)      -
      5:     58500588    57750975 (-749613) -----     40.569
40.080 (
-0.489)      --
      6:     58497015    57744418 (-752597) -----     41.097
40.147 (
-0.950)      -----
      7:     58494335    57749909 (-744426) -----     39.983 40.214
(  0.231)     +
      8:     58500401    57750305 (-750096) -----     40.235 40.417
(  0.182)     +
      9:     58490728    57767463 (-723265) -----     40.354
39.928 (
-0.426)      --
     10:     58497858    57746557 (-751301) -----     40.756
39.706 (
-1.050)      -----
============================================================
             58499154    57753091 (-746062) -----     40.457
40.027 (
-0.430)      --
instr delta =      -746062    -1.2753%
time  delta =       -0.430 ms -1.0619%


(baseline vs my patch)

             baseline    ioi baseline  ioi
      1:     58503574    57821124 (-682450)      ----- 40.554 39.783 (
-0.771)      -----
      2:     58499325    57819459 (-679866) -----     40.092 40.325
(  0.233)    ++
      3:     58492362    57811978 (-680384) -----     40.546
39.826 (
-0.720)      -----
      4:     58488655    57828878 (-659777) -----     40.270 40.550
(  0.280)    ++
      5:     58501567    57830179 (-671388) -----     40.382
40.145 (
-0.237)      --
      6:     58496552    57808774 (-687778) -----     40.702
40.527 (
-0.175)      -
      7:     58482701    57808925 (-673776) -----     40.268
39.849 (
-0.419)      ---
      8:     58493831    57807810 (-686021) -----     40.396
39.940 (
-0.456)      ---
      9:     58489388    57811354 (-678034) -----     40.575
40.078 (
-0.497)      ---
     10:     58482512    57795489 (-687023) -----     40.084 40.247
(  0.163)     +
============================================================
             58493046    57814396 (-678650) -----     40.386
40.126 (
-0.260)      --
instr delta =      -678650    -1.1602%
time  delta =       -0.260 ms -0.6445%


(your patch vs my patch)

             jiangli ioi                              jiangli ioi
      1:     57716711    57782622 ( 65911) ++++          41.042 40.302 (
-0.740)      -----
      2:     57709666    57780196 ( 70530) ++++          40.334 40.965 (
0.631)  ++++
      3:     57716074    57803315 ( 87241) +++++          40.239 39.823 (
-0.416)      ---
      4:     57725152    57782719 ( 57567) +++          40.430 39.805 (
-0.625)      ----
      5:     57719799    57787187 ( 67388) ++++          40.138 40.003 (
-0.135)      -
      6:     57721922    57769193 ( 47271) +++          40.324 40.207 (
-0.117)      -
      7:     57716438    57785212 ( 68774) ++++          39.978 40.149 (
0.171)     +
      8:     57713834    57778797 ( 64963) ++++          40.359 40.210 (
-0.149)      -
      9:     57711272    57786376 ( 75104) ++++          40.575 40.724 (
0.149)     +
     10:     57711660    57780548 ( 68888) ++++          40.291 40.091 (
-0.200)      -
============================================================
             57716252    57783615 ( 67363) ++++          40.370 40.226 (
-0.144)      -
instr delta =        67363     0.1167%
time  delta =       -0.144 ms -0.3560%


These numbers show that the majority of the time spent (678650
instructions) inside InstanceKlass::link_class_impl is spent from the
PerfClassTraceTime. Walking of the class hierarchy and taking the
h_init_lock only takes about 67363 instructions).

Due to this finding, I filed two more RFEs:

https://bugs.openjdk.java.net/browse/JDK-8246019
PerfClassTraceTime slows down VM start-up

It's related to JDK-8246020, and I've commented on the bug (see
JDK-8246020 comments). UsePerfData for perf data collection is common
in cloud usages. It's better to keep UsePerfData enabled by default.

https://bugs.openjdk.java.net/browse/JDK-8246015
Method::link_method is called twice for CDS methods

That was addressed as part of the initial change for JDK-8232222:
http://cr.openjdk.java.net/~jiangli/8232222/weberv.02/src/hotspot/share/oops/instanceKlass.cpp.frames.html


It's cleaner to handle it separately, so I removed it from the latest
version. I've assigned JDK-8246015 to myself and will address it
separately. Thanks for recording the separate bug.

Thanks!
Jiangli


Thanks
- Ioi


Reply via email to