Serguei, On Sun, 2016-05-08 at 17:25 -0700, serguei.spit...@oracle.com wrote: > On 5/8/16 03:58, serguei.spit...@oracle.com wrote: > > Severin, > > > > The JPRT job failed: 2016-05-08-100525.sspitsyn.8153711 with a > > NullPointerException. > > The log link is (you can also find the most relevant fragment of the log in > > the bug report): > > > > http://scaaa637.us.oracle.com//archive/2016/05/2016-05-08-100525.sspitsyn.8153711//logs/solaris_x64_5.11-fastdebug-c2-jdk_svc_sanity.log > > Sorry for providing the link that is not accessible for you. > But, please, find a fragment of the failure log in my bug report comment.
Thanks for your help. Time for me to find a Solaris box and try to reproduce. Cheers, Severin > > Thanks, > Serguei > > > > > Please, find the email notification in the attachment. > > > > Thanks, > > Serguei > > > > > > On 5/8/16 00:59, serguei.spit...@oracle.com wrote: > > > Hi Severin, > > > > > > I filed two new bugs to cover the discovered issues: > > > 8156498: more places in the invoke.c that need protection with the > > > invokerLock > > > 8156500: deadlock provoked by new stress test > > > com/sun/jdi/OomDebugTest.java > > > > > > > > > Will try to push your two fixes today or tomorrow. > > > I know, you've just got a committer status. > > > But I'm not sure you are comfortable to push the fixes yourself at this > > > time. > > > > > > Thanks, > > > Serguei > > > > > > > > > > > > On 5/3/16 03:21, serguei.spit...@oracle.com wrote: > > > > Hi Severin, > > > > > > > > Please, find my comments below. > > > > > > > > > > > > On 5/2/16 01:44, Severin Gehwolf wrote: > > > > > On Fri, 2016-04-29 at 12:33 -0700, serguei.spit...@oracle.com wrote: > > > > > > On 4/29/16 01:56, Severin Gehwolf wrote: > > > > > > > Hi Serguei, > > > > > > > > > > > > > > On Fri, 2016-04-29 at 01:34 -0700, serguei.spit...@oracle.com > > > > > > > wrote: > > > > > > > > Hi Severin, > > > > > > > > > > > > > > > > The fix looks good in general. > > > > > > > > I'm testing both fixes together at the moment. > > > > > > > That is JDK-8154529 and JDK-8153711? Yes, that's what I've done > > > > > > > too. > > > > > > > > A couple of questions... > > > > > > > > > > > > > > > > It seems, there are more places where an invokerLock critical > > > > > > > > section is missed. > > > > > > > Right. > > > > > > > > The following functions: > > > > > > > > - invoker_enableInvokeRequests > > > > > > > This should be fixed with the patch for JDK-8154529 > > > > > > > > - invokeConstructor > > > > > > > > - invokeStatic > > > > > > > > - invokeNonvirtual > > > > > > > > - invokeVirtual > > > > > > > > - saveGlobalRef > > > > > > > Correct. The intent would be to fix the callsites of > > > > > > > saveGlobalRef. If > > > > > > > we fix invoke* then saveGlobalRef should not be an issue. I > > > > > > > didn't want > > > > > > > to include this in this fix since it's pretty hairy and would > > > > > > > like to > > > > > > > fix this in incremental steps. > > > > > > > > > > > > > > > The first function is easy to fix. > > > > > > > > The last 5 functions are called from the invoker_doInvoke() > > > > > > > > that we already had a problem with. > > > > > > > > I'm puzzled with the question: How to synchronize and avoid > > > > > > > > deadlocks at the same time? > > > > > > > I'm not sure yet. Perhaps locking only while saveGlobalRef is > > > > > > > being > > > > > > > called in invoke* functions will help. > > > > > > > > > > > > > > > I'm inclined to let your fix go (if the testing is Ok) and file > > > > > > > > one more bug on the remaining sync issues. > > > > > > > Please keep me in the loop about your test results. > > > > > > Both the JTreg com/sun/jdi and the co-located nsk.jdi tests are > > > > > > all passed. > > > > > > I also ran the 4 previously failed tests in big loops of 1000 runs: > > > > > > com/sun/jdi/InvokeTest.java > > > > > > com/sun/jdi/InvokeHangTest.java > > > > > > com/sun/jdi/InterfaceMethodsTest.java > > > > > > com/sun/jdi/OomDebugTest.java (new test introduced in the > > > > > > webrev) > > > > > I suggest to run InvokeTest, InvokeHangTest and InterfaceMethodsTest > > > > > in > > > > > a loop. Those never failed for me in such a scenario. > > > > > > > > > > > The OomDebugTest.java failed with a timeout (most likely, a > > > > > > deadlock). > > > > > > Please, find the OomDebugTest.jtr file in attachments. > > > > > Correct. This is what I was seeing. See the last comment of the bug: > > > > > https://bugs.openjdk.java.net/browse/JDK-8153711 > > > > > > > > Need to check if it is the same as I'm seeing. > > > > > > > > > > > > > > It has the jstack output of the hung OomDebugTestTarg JVM. I'm not > > > > > convinced this is the same failure we were seeing in JDK-4858370 > > > > > since > > > > > the stack suggests it's doing a GC upon a newInstance of a primitive > > > > > array. It also does not seem like the same issue as the deadlocks > > > > > exposed by locking during an invoke, because it was reproducible > > > > > fairly > > > > > consistently with InvokeHangTest. > > > > > > > > Agreed. > > > > > > > > > > > > > > > > > > It looks to me like a new issue. Probably one which was there before, > > > > > but is only exposed by the new test. The new test stress-tests the GC > > > > > with a debugger attached. Of course, this is going to be hard to > > > > > prove > > > > > since it would just run out of memory prior the patch. > > > > > > > > Thanks. > > > > Your analysis seems to be reasonable. > > > > I tend to agree with it but need more time to convince myself. > > > > > > > > > > > > > Thoughts? > > > > > > > > Let me double check the above. > > > > If the analysis is correct then I'd suggest to file new bug and push > > > > your fix as is. > > > > > > > > > > > > Thanks, > > > > Serguei > > > > > > > > > > > > > > > > > > Cheers, > > > > > Severin > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > Serguei > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your help! > > > > > > > > > > > > > > Cheers, > > > > > > > Severin > > > > > > > > > > > > > > > Thanks, > > > > > > > > Serguei > > > > > > > > > > > > > > > > > > > > > > > > On 4/28/16 02:00, Severin Gehwolf wrote: > > > > > > > > > On Tue, 2016-04-19 at 19:32 -0700, serguei.spit...@oracle.com > > > > > > > > > wrote: > > > > > > > > > > > Hi Severin, > > > > > > > > > > > > > > > > > > > > > > I postpone a push for this fix. > > > > > > > > > > > > > > > > > > > > > > There are two nsk.jdi test failures (they look like > > > > > > > > > > > deadlocks): > > > > > > > > > > > nsk/jdi/ObjectReference/invokeMethod/invokemethod012 > > > > > > > > > > > FAIL(TIMEOUT) > > > > > > > > > > > nsk/jdi/Scenarios/invokeMethod/popframes001 > > > > > > > > > > > FAIL(TIMEOUT) > > > > > > > > > > > > > > > > > > > > > > and one jtreg sun/com/jdi failure (it looks like a > > > > > > > > > > > deadlock too): > > > > > > > > > > > com/sun/jdi/InvokeHangTes.java > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'll double check if these failures are regressions > > > > > > > > > > > caused by your fix > > > > > > > > > > > or not. > > > > > > > > > > I confirm, the failures above are new regressions > > > > > > > > > > introduced by the fix. > > > > > > > > > > The tests fail consistently with the fix and do not fail > > > > > > > > > > without the fix. > > > > > > > > > OK this was caused by the locking done in > > > > > > > > > invoker_doInvoke(). Note that > > > > > > > > > holding either of them invoker lock or event handler lock > > > > > > > > > causes this. > > > > > > > > > > > > > > > > > > Here is a new webrev with those hunks removed. It's > > > > > > > > > sufficient to grab > > > > > > > > > the locks again in invoke_completeInvokeRequest() when > > > > > > > > > clearing the > > > > > > > > > global references in order to not get those failures we've > > > > > > > > > seen when > > > > > > > > > the fix for JDK-4858370 was in place. > > > > > > > > > > > > > > > > > > http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8153711/webrev.02/ > > > > > > > > > > > > > > > > > > > > > > > > > > > Testing done: > > > > > > > > > - com/sun/jdi/InvokeTest.java > > > > > > > > > com/sun/jdi/InvokeHangTest.java and > > > > > > > > > sun/jdi/InterfaceMethodsTest.java does not fail in 1500 > > > > > > > > > runs. > > > > > > > > > - regular com/sun/jdi test set: no regressions > > > > > > > > > > > > > > > > > > Note that there are some rare cases where OomDebugTest times > > > > > > > > > out which > > > > > > > > > seems to be caused by the GC, though. See the bug for > > > > > > > > > details. Since > > > > > > > > > this problem is rare, it's still worthwhile having that test > > > > > > > > > included. > > > > > > > > > If it turns out a problem in practice we could consider > > > > > > > > > disabling the > > > > > > > > > test. > > > > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Severin > > > > > > > > > > > > > > > > > > > > > > > >