product(bool, UsePerfData,
true, \
"Flag to disable jvmstat instrumentation for performance
testing "\
"and problem isolation
purposes") \
The flag default value set to true --- should we change that? If the
flag set to false at default, performance can benefit from that. If
users want to collect performance data, should explicitly turn it on.
Thanks
Yumin
On 6/16/20 8:19 PM, Ioi Lam wrote:
On 6/16/20 6:20 PM, David Holmes wrote:
Hi Ioi,
On 17/06/2020 6:14 am, Ioi Lam wrote:
https://bugs.openjdk.java.net/browse/JDK-8246019
http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/
PerfClassTraceTime is (a rarely used feature) for measuring the time
spent during class linking and initialization.
"A special command jcmd <process id/main class> PerfCounter.print
prints all performance counters in the process."
How do you know this is a "rarely used feature"?
Hi David,
Sure, the counter will be dumped, but by "rarely used" -- I mean no
one will find this particular counter useful, and no one will be
actively looking at it.
I changed two parts of the code -- class init and class linking.
For class initialization, the counter may be useful for people who
want to know how much time is spent in their <clinit> functions, and
my patch doesn't change that. It only avoids using the counter when a
class has no <clinit>, i.e., we know that the counter counts nothing
(except for a logging statement).
=====
For class linking, no user code is executed, so it only measures VM
code. If it's useful for anyone, that would be VM engineers like me
who are trying to optimize the speed of class loading. However, due to
the overhead of the counter vs what it's trying to measure, the
results are pretty meaningless.
Note that I've not disabled the counter altogether. Instead, I disable
it only when linking a CDS shared class, and we know that very little
is happening for this class (e.g., no verification).
I think the class linking timer might have been useful 15 years ago
when it was introduced, or it might be useful today when CDS is
disabled. But with CDS enabled, we are paying a constant price that
seems to benefit no one.
I think we should short-circuit it when it seems appropriate. If this
indeed causes problems for our users, it's easy to re-enable it.
That's better than just keeping this forever just because we're afraid
to touch anything.
I find it hard to evaluate whether this short-circuiting of the time
tracing is reasonable or not. Obviously any monitoring mechanism
should impose minimal overhead compared to what is being measured,
and these timers fall short in that regard. But if these stats become
meaningless then they may as well be removed.
I think the serviceability folk (cc'd) need to evaluate this in the
context of the M&M tools.
However, it's quite expensive and it needs to start and stop a bunch
of timers. With CDS, it's quite often for the overhead of the timer
itself to be much more than the time it's trying to measure, giving
unreliable measurement.
In this patch, when it's clear that the init and linking will be
very quick, I disable the timer and count only the number of
invocations. This shows a small improvement in start-up
I'm curious if you tried to forcing EagerInitialization to be true to
see how that improves the baseline. I've always noticed eager_init in
the code, but hadn't realized it is disabled by default.
I think it cannot be done by default, as it will violate the JLS. A
class can be initialized only when it's touched by bytecodes.
It can also backfire as we may load many classes without initializing
them. E.g., during bytecode verification, we load many classes and
just check that one is a supertype of another.
Thanks
- Ioi
Cheers,
David
-----
Results of " perf stat -r 100 bin/java -Xshare:on
-XX:SharedArchiveFile=jdk2.jsa -Xint -version "
59623970 59341935 (-282035) ----- 41.774 41.591 ( -0.183) -
59623495 59331646 (-291849) ----- 41.696 41.165 ( -0.531) --
59627148 59329526 (-297622) ----- 41.249 41.094 ( -0.155) -
59612439 59340760 (-271679) ---- 41.773 40.657 ( -1.116) -----
59626438 59335681 (-290757) ----- 41.683 40.901 ( -0.782) ----
59618436 59338953 (-279483) ----- 41.861 41.249 ( -0.612) ---
59608782 59340173 (-268609) ---- 41.198 41.508 ( 0.310) +
59614612 59325177 (-289435) ----- 41.397 41.738 ( 0.341) ++
59615905 59344006 (-271899) ---- 41.921 40.969 ( -0.952) ----
59635867 59333147 (-302720) ----- 41.491 40.836 ( -0.655) ---
================================================
59620708 59336100 (-284608) ----- 41.604 41.169 ( -0.434) --
instruction delta = -284608 -0.4774%
time delta = -0.434 ms -1.0435%
The number of PerfClassTraceTime's used is reduced from 564 to 116
(so we have an overhead of about 715 instructions per use, yikes!).