Re: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up

Claes Redestad Thu, 18 Jun 2020 04:40:22 -0700



On 2020-06-17 05:19, Ioi Lam wrote:

On 6/16/20 6:20 PM, David Holmes wrote:
Hi Ioi,

On 17/06/2020 6:14 am, Ioi Lam wrote:
https://bugs.openjdk.java.net/browse/JDK-8246019
http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/
PerfClassTraceTime is (a rarely used feature) for measuring the timespent during class linking and initialization.
"A special command jcmd <process id/main class> PerfCounter.printprints all performance counters in the process."
How do you know this is a "rarely used feature"?
Hi David,
Sure, the counter will be dumped, but by "rarely used" -- I mean no onewill find this particular counter useful, and no one will be activelylooking at it.
I changed two parts of the code -- class init and class linking.
For class initialization, the counter may be useful for people who wantto know how much time is spent in their <clinit> functions, and my patchdoesn't change that. It only avoids using the counter when a class hasno <clinit>, i.e., we know that the counter counts nothing (except for alogging statement).
=====
For class linking, no user code is executed, so it only measures VMcode. If it's useful for anyone, that would be VM engineers like me whoare trying to optimize the speed of class loading. However, due to theoverhead of the counter vs what it's trying to measure, the results arepretty meaningless.
Note that I've not disabled the counter altogether. Instead, I disableit only when linking a CDS shared class, and we know that very little ishappening for this class (e.g., no verification).
I think the class linking timer might have been useful 15 years ago whenit was introduced, or it might be useful today when CDS is disabled. Butwith CDS enabled, we are paying a constant price that seems to benefitno one.
I think we should short-circuit it when it seems appropriate. If thisindeed causes problems for our users, it's easy to re-enable it. That'sbetter than just keeping this forever just because we're afraid to touchanything.


I think this seems like well-rounded approach overall, but this assumes
that we're mostly measuring the overhead of measurement here. I don't
doubt that's the case for the scenarios you're excluding here and now,
but it's hard to guarantee this property hold in the future.

Perhaps a diagnostic flag to enable timing unconditionally would be
appropriate? With such a flag we could verify that the time deltas of
running some applications with and without the flag roughly matches the
time delta in reported linking time. If they diverge, we might need to
adjust the conditions.

I find it hard to evaluate whether this short-circuiting of the timetracing is reasonable or not. Obviously any monitoring mechanismshould impose minimal overhead compared to what is being measured, andthese timers fall short in that regard. But if these stats becomemeaningless then they may as well be removed.
I think the serviceability folk (cc'd) need to evaluate this in thecontext of the M&M tools.


As a complement (or even alternative) there might be ways we can reduce
time-to-measure overheads. E.g, JFR added
FastUnorderedElapsedCounterSource (share/utilities/ticks.hpp) which uses
rdtsc if available (x86 - fallback to os::elapsed_counter otherwise).

This might be a reasonable alternative for the Perf* timers, which
should be short-running events on a single thread.

/Claes

However, it's quite expensive and it needs to start and stop a bunchof timers. With CDS, it's quite often for the overhead of the timeritself to be much more than the time it's trying to measure, givingunreliable measurement.
In this patch, when it's clear that the init and linking will be veryquick, I disable the timer and count only the number of invocations.This shows a small improvement in start-up
I'm curious if you tried to forcing EagerInitialization to be true tosee how that improves the baseline. I've always noticed eager_init inthe code, but hadn't realized it is disabled by default.
I think it cannot be done by default, as it will violate the JLS. Aclass can be initialized only when it's touched by bytecodes.
It can also backfire as we may load many classes without initializingthem. E.g., during bytecode verification, we load many classes and justcheck that one is a supertype of another.
Thanks
- Ioi
Cheers,
David
-----
Results of " perf stat -r 100 bin/java -Xshare:on-XX:SharedArchiveFile=jdk2.jsa -Xint -version "
59623970 59341935 (-282035)   -----  41.774  41.591 ( -0.183) -
59623495 59331646 (-291849)   -----  41.696  41.165 ( -0.531) --
59627148 59329526 (-297622)   -----  41.249  41.094 ( -0.155) -
59612439 59340760 (-271679)   ----   41.773  40.657 ( -1.116) -----
59626438 59335681 (-290757)   -----  41.683  40.901 ( -0.782) ----
59618436 59338953 (-279483)   -----  41.861  41.249 ( -0.612) ---
59608782 59340173 (-268609)   ----   41.198  41.508 (  0.310) +
59614612 59325177 (-289435)   -----  41.397  41.738 (  0.341) ++
59615905 59344006 (-271899)   ----   41.921  40.969 ( -0.952) ----
59635867 59333147 (-302720)   -----  41.491  40.836 ( -0.655) ---
================================================
59620708 59336100 (-284608)   -----  41.604  41.169 ( -0.434) --
instruction delta =      -284608    -0.4774%
time        delta =       -0.434 ms -1.0435%
The number of PerfClassTraceTime's used is reduced from 564 to 116(so we have an overhead of about 715 instructions per use, yikes!).

Re: RFR(S) 8246019 PerfClassTraceTime slows down VM start-up

Reply via email to