Hi Richard, Thanks for the quick reply.
> > > With DeoptimizeObjectsALot enabled internal threads are started that > > > deoptimize frames and > > > objects. The number of threads started are given with > > > DeoptimizeObjectsALotThreadCountAll and > > > DeoptimizeObjectsALotThreadCountSingle. The former targets all > existing > > > threads whereas the > > > latter operates on a single thread selected round robin. > > > > > > I removed the mode where deoptimizations were performed at every nth > > > exit from the runtime. I never used it. > > > Do I get it right? You have a n:1 and a n:all test scenario. > > n:1: n threads deoptimize 1 Jana thread where n => DOALThreadCountSingle > > n:m: n threads deoptimize all Java threads where n = DOALThreadCountAll? > > Not quite. > > -XX:+DeoptimizeObjectsALot // required > -XX:DeoptimizeObjectsALotThreadCountAll=m > -XX:DeoptimizeObjectsALotThreadCountSingle=n > > Will start m+n threads. Each operating on all existing JavaThreads using > EscapeBarriers. The > difference between the 2 thread types is that one distinct EscapeBarrier > targets either just a > single thread or all exisitng threads at onece. If just one single thread is > targeted per > EscapeBarrier, then it is not always the same thread, but threads are selected > round robin. So there > will be n threads selecting independently single threads round robin per > EscapeBarrier and m threads > that target all threads in every EscapeBarrier. Ok, yes, that is how I understood it. > > > * EscapeBarrier::sync_and_suspend_one(): use a direct handshake and > > > execute it always independently > > > of is_thread_fully_suspended(). > > Is this also a performance optimization? > > Maybe a minor one. OK > > > * JavaThread::wait_for_object_deoptimization(): > > > - Bugfix: the last check of is_obj_deopt_suspend() must be /after/ the > > > safepoint check! This > > > caused issues with not walkable stacks with DeoptimizeObjectsALot. > > OK. As I understand, there was one safepoint check in the old version, > > now there is one in each iteration. I assume this is intended, right? > > Yes it is. The important thing here is (A) a safepoint check is needed /after/ > leaving a safe state > (_thread_in_native, _thread_blocked). (B) Shared variables that are modified > at safepoints or with handshakes need to be reread /after/ the safepoint > check. > > BTW: I only noticed now that since JDK-8240918 JavaThreads themselves > must disarm their polling > page. Originally (before handshakes) this was done by the VM thread. With > handshakes it was done by > the thread executing the handshake op. This was changed for > OrderAccess::cross_modify_fence() where > the poll is left armed if the thread is in native and sice JDK-8240918 it is > always left armed. So > when a thread leaves a safe state (native, blocked) and there was a > handshake/vm op, it will always > call SafepointMechanism::block_if_requested_slow(), even if the > handshake/vm operation have been > processed already and everybody else is happyly executing bytecodes :) Ok. > Still (A) and (B) hold. > > > - Added limited spinning inspired by HandshakeSpinYield to fix > > > regression in > > > microbenchmark [1] > > Ok. Nice improvement, nice catch! > > Yes. It certainly took some time to find out. > > > > > > > I refer to some more changes answering your questions and comments > inline > > > below. > > > > > > Thanks, > > > Richard. > > > > > > [1] Microbenchmark: > > > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbe > nchmark/ > > > > > > > > > I understand you annotate at safepoints where the escape analysis > > > > finds out that an object is "better" than global escape. > > > > This are the cases where the analysis identifies optimization > > > > opportunities. These annotations are then used to deoptimize > > > > frames and the objects referenced by them. > > > > Doesn't this overestimate the optimized > > > > objects? E.g., eliminate_alloc_node has many cases where it bails > > > > out. > > > > > > Yes, the implementation is conservative, but it is comparatively simple > and > > > the additional debug > > > info is just 2 flags per safepoint. > > Thanks. It also helped that you explained to me offline that > > there are more optimizations than only lock elimination and scalar > > replacement done based on the ea information. > > The ea refines the IR graph with allows follow up optimizations > > which can not easily be tracked back to the escaping objects or > > the call sites where they do not escape. > > Thus, if there are non-global escaping objects, you have to > > deoptimize the frame. > > Did I repeat that correctly? > > Mostly, but there are also cases where deoptimization is required if and only > if ea-local objects > are passed as arguments. This is the case when values are not read directly > from a frame, but from a callee frame. Hmm, don't get this completely, but ok. > > > Accesses to instance > > > members or array elements can be optimized as well. > > You mean the compiler can/will ignore volatile or memory ordering > > requirements for non-escaping objects? Sounds reasonable to do. > > Yes, for instance. Also without volatile modifiers it will eliminate accesses. > Here is an example: > Method A has a NoEscape allocation O that is not scalar replaced. A calls > Method B, which is not > inlined. When you use your debugger to break in B, then modify a field of O, > then this modification > would have no effect without deoptimization, because the jit assumes that B > cannot modify O without > a reference to it. Yes, A can keep O in a register, while the JVMTI thread would write to the location in the stack where the local is held (if it was written back). > > > > Syncronization: looks good. I think others had a look at this before. > > > > > > > > EscapeBarrier::deoptimize_objects_internal() > > > > The method name is misleading, it is not used by > > > > deoptimize_objects(). > > > > Also, method with the same name is in Deopitmization. > > > > Proposal: deoptimize_objects_thread() ? > > > > > > Sorry, but I don't see, why it would be misleading. > > > What would be the meaning of 'deoptimize_objects_thread'? I don't > > > understand that name. > > 1. I have no idea why it's called "_internal". Because it is private? > > By the name, I would expect that EscapeBarrier::deoptimize_objects() > > calls it for some internal tasks. But it does not. > > Well, I'd say it is pretty internal, what's happening in that method. So IMHO > the suffix _internal > is a match. > > > 2. My proposal: deoptimize_objects_all_threads() iterates all threads > > and calls deoptimize_objects(_one)_thread(thread) for each of these. > > That's how I would have named it. > > But no bike shedding, if you don't see what I mean it's not obvious. > Ok. We could have a quick call, too, if you like. Ok, I think I have understood the remaining points. I'm fine with this so far. Thanks, Goetz.