On 6/17/20 10:16 PM, David Holmes wrote:
Hi Ioi,
Re EagerInitialize ... skip to bottom part ...
On 18/06/2020 2:34 pm, Ioi Lam wrote:
On 6/17/20 7:38 PM, David Holmes wrote:
Hi Ioi,
On 17/06/2020 1:19 pm, Ioi Lam wrote:
On 6/16/20 6:20 PM, David Holmes wrote:
Hi Ioi,
On 17/06/2020 6:14 am, Ioi Lam wrote:
https://bugs.openjdk.java.net/browse/JDK-8246019
http://cr.openjdk.java.net/~iklam/jdk16/8246019-avoid-PerfClassTraceTime.v01/
PerfClassTraceTime is (a rarely used feature) for measuring the
time spent during class linking and initialization.
"A special command jcmd <process id/main class> PerfCounter.print
prints all performance counters in the process."
How do you know this is a "rarely used feature"?
Hi David,
Sure, the counter will be dumped, but by "rarely used" -- I mean no
one will find this particular counter useful, and no one will be
actively looking at it.
I changed two parts of the code -- class init and class linking.
For class initialization, the counter may be useful for people who
want to know how much time is spent in their <clinit> functions,
and my patch doesn't change that. It only avoids using the counter
when a class has no <clinit>, i.e., we know that the counter counts
nothing (except for a logging statement).
I can see where you are coming from here. Today we keep track of the
time taken to mark a class as initialized when there is no clinit to
execute, and we actually record pure timer overhead as it dwarfs the
simple update to the class state. With your change we now won't
track the time taken to mark the class as initialized. In both cases
the time recorded is inaccurate - in opposite senses. In that regard
your slight underestimation of the class initalization cost seems
better then the present over-estimate.
=====
For class linking, no user code is executed, so it only measures VM
code. If it's useful for anyone, that would be VM engineers like me
who are trying to optimize the speed of class loading. However, due
to the overhead of the counter vs what it's trying to measure, the
results are pretty meaningless.
Note that I've not disabled the counter altogether. Instead, I
disable it only when linking a CDS shared class, and we know that
very little is happening for this class (e.g., no verification).
Yes I have little concern for this part. Linking is a multi-phase
process so "time for linking" is already ill-defined. And the fact
you only do it for CDS makes it even less interesting.
I think the class linking timer might have been useful 15 years ago
when it was introduced, or it might be useful today when CDS is
disabled. But with CDS enabled, we are paying a constant price that
seems to benefit no one.
I think we should short-circuit it when it seems appropriate. If
this indeed causes problems for our users, it's easy to re-enable
it. That's better than just keeping this forever just because we're
afraid to touch anything.
I'm uncomfortable with both the "keep forever as we're too afraid to
change" and "change it now and restore it if anyone complains" ends
of this spectrum. Obviously we need to make progress, but the
"change it now and change back later if needed" is a bit naive, as
once any change is made we can't change back without affecting
another set of users, and we don't know how long it will be before
the change reaches users and the problems return to us. From a CSR
perspective I want to see that due diligence has been applied with
regard to these behavioural changes, and JDK engineers are often not
in a position to understand how end users use this kind of
information. I don't have a solution for that general problem.
In this particular case I think under-estimating the class
initialization overhead is better than the present over-estimate.
Though anyone tracking the trends here will be surprised when the
cost suddenly drops.
Hi David,
I don't have a solution, either. I am in no hurry, and the
improvement is minor.
I could post a CSR and let it stand for a few months to see if anyone
objects. My impression is "linking time" is such an esoteric feature
that no one will care, but I may be wrong. Actually it would be good
if someone tells me I am wrong -- they probably are experiencing some
overhead in class loading that we don't currently know about.
As I mentioned earlier, if anyone is using this timer, it would be VM
engineers who work on class loading. In fact, Yumin fixed 8178349
"Cache builtin class loader constraints to avoid re-initializing
itable/vtable for shared classes" in JDK 15, which significantly
reduced the amount of time spent during class linking. However, we
didn't use this timer for measuring the effectiveness of that fix, as
the overhead and variability are too high. We have reached a point in
the class loading code that we can only make small incremental
improvements, and we can only measure the effect of our changes with
external profilers such as "perf stat -r 200 bin/java -version" that
launches the VM 200 times (and repeat that 10 times) and averages the
elapsed time.
Maybe it's time for the "class linking" timer to go away completely,
or at least be disabled when CDS is enabled. It's pretty much
useless. I wish we had an established deprecation process for such
legacy features.
I find it hard to evaluate whether this short-circuiting of the
time tracing is reasonable or not. Obviously any monitoring
mechanism should impose minimal overhead compared to what is being
measured, and these timers fall short in that regard. But if these
stats become meaningless then they may as well be removed.
I think the serviceability folk (cc'd) need to evaluate this in
the context of the M&M tools.
However, it's quite expensive and it needs to start and stop a
bunch of timers. With CDS, it's quite often for the overhead of
the timer itself to be much more than the time it's trying to
measure, giving unreliable measurement.
In this patch, when it's clear that the init and linking will be
very quick, I disable the timer and count only the number of
invocations. This shows a small improvement in start-up
I'm curious if you tried to forcing EagerInitialization to be true
to see how that improves the baseline. I've always noticed
eager_init in the code, but hadn't realized it is disabled by
default.
I think it cannot be done by default, as it will violate the JLS. A
class can be initialized only when it's touched by bytecodes.
I'm also not sure it violates JLS as you can't directly query if a
class is initialized, but anyway I wasn't suggesting turning this on
by default, I meant only in regard to get a performance baseline to
compare against the changes you have implemented here.
I don't think EagerInitialization will make a difference that's
related to the timer. Both the two instances of the timer will still
be used for exactly the same number of times, just under different
call stacks.
Eager initialization doesn't use any timers.
Aside: I have to wonder whether anyone uses EagerInitialization or
whether we should get rid of it?
I wonder about the same thing. I'll ask around and file an RFE if
appropriate.
It can also backfire as we may load many classes without
initializing them. E.g., during bytecode verification, we load many
classes and just check that one is a supertype of another.
Not sure what is backfiring here ??
Here's an example that both violates the JLS and backfires as it
slows down VM start up, because <clinit> can have observable side
effects:
class Main {
static int X;
public void main(String args[]) {
System.out.println(X);
}
void deadcode() {
Super s = new Child();
s.method();
}
}
class Super {
void method() {}
}
class Child extends Super {
static {
for (int i=0; i<1000000; i++) {
Main.X ++;
}
}
}
When Main is linked, its bytecodes are verified, including
deadcode(). Since deadcode() has an implicit cast of Child to Super,
the verifier needs to load both Child and Super, and check that Child
is indeed a subclass of Super.
If EagerInitialization is enabled, Child will be initialized as soon
as it's entered into the system dictionary. This violates
No it won't. Eager initialization is only applicable when there is no
<clinit> and where the super classes are already initialised:
void InstanceKlass::eager_initialize(Thread *thread) {
if (!EagerInitialization) return;
if (this->is_not_initialized()) {
// abort if the the class has a class initializer
if (this->class_initializer() != NULL) return;
// abort if it is java.lang.Object (initialization is handled in
genesis)
Klass* super_klass = super();
if (super_klass == NULL) return;
// abort if the super class should be initialized
if (!InstanceKlass::cast(super_klass)->is_initialized()) return;
// call body to expose the this pointer
eager_initialize_impl();
}
}
David
-----
Ah, I didn't see that. I should have read the code first :-)
Anyway, Eager initialization will trigger verification which may cause
more classes to be loaded (and initialized if they don't have <clinit>,
and so forth). So it's not equivalent to, and is hard to compare with,
my proposed changes for skipping the timer.
Thanks
- Ioi
https://docs.oracle.com/javase/specs/jls/se14/html/jls-12.html#jls-12.4.1
A class or interface type T will be initialized immediately before
the first occurrence of any one of the following:
T is a class and an instance of T is created.
A static method declared by T is invoked.
A static field declared by T is assigned.
A static field declared by T is used and the field is not a constant
variable (§4.12.4).
....
A class or interface will not be initialized under any other
circumstance.
Also, app developers will generally expect 0 to be printed. So
EagerInitialization will probably break apps in subtle ways.
And initializing Child may recursively load in more classes during
the verification of Child ......
Thanks
- Ioi
Thanks,
David
-----
Thanks
- Ioi
Cheers,
David
-----
Results of " perf stat -r 100 bin/java -Xshare:on
-XX:SharedArchiveFile=jdk2.jsa -Xint -version "
59623970 59341935 (-282035) ----- 41.774 41.591 ( -0.183) -
59623495 59331646 (-291849) ----- 41.696 41.165 ( -0.531) --
59627148 59329526 (-297622) ----- 41.249 41.094 ( -0.155) -
59612439 59340760 (-271679) ---- 41.773 40.657 ( -1.116) -----
59626438 59335681 (-290757) ----- 41.683 40.901 ( -0.782) ----
59618436 59338953 (-279483) ----- 41.861 41.249 ( -0.612) ---
59608782 59340173 (-268609) ---- 41.198 41.508 ( 0.310) +
59614612 59325177 (-289435) ----- 41.397 41.738 ( 0.341) ++
59615905 59344006 (-271899) ---- 41.921 40.969 ( -0.952) ----
59635867 59333147 (-302720) ----- 41.491 40.836 ( -0.655) ---
================================================
59620708 59336100 (-284608) ----- 41.604 41.169 ( -0.434) --
instruction delta = -284608 -0.4774%
time delta = -0.434 ms -1.0435%
The number of PerfClassTraceTime's used is reduced from 564 to
116 (so we have an overhead of about 715 instructions per use,
yikes!).