Hi Chris,

On 10/07/2018 4:22 AM, Chris Plummer wrote:
Hi David,

Would it be better to problem list this test on solaris using JDK-8156708. That way when JDK-8156708 is fixed it can come off the problem list and start executing on solaris.

JDK-8156708 is already fixed - it's a dup of JDK-8154715. We could only fix this for VM created threads. The general problem of TLS destructors looping if a thread terminates without detaching from the VM is not solvable - other than by not using TLS in the VM.

Thanks,
David

thanks,

Chris

On 7/8/18 4:58 PM, David Holmes wrote:
tl;dr skip the new regression test on Solaris

New webrev:

http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/

This excludes the test from running on Solaris, so the makefile doesn't bother compiling this native test and the Java part of the test adds:

! * @requires os.family != "windows" & os.family != "solaris"
  * @summary Basic test of Thread and ThreadMXBean queries on a natively
  *          attached thread that has failed to detach before terminating. + * @comment The native code only supports POSIX so no windows testing; also + *          we have to skip solaris as a terminating thread that fails to + *          detach will hit an infinite loop due to TLS destructor issues - see
+ *          comments in JDK-8156708

Note this means that Solaris is not affected by the original issue because a still-attached native thread can't actually terminate due to the TLS destructor infinite-loop issue.

Thanks,
David

On 6/07/2018 6:07 PM, David Holmes wrote:
<sigh> The new test is hanging on Solaris. I just discovered we don't run these tests on Solaris until tier4.

David

On 6/07/2018 8:40 AM, David Holmes wrote:
Hi Chris,

Thanks for looking at this.

Updated webrev:

http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/

Only real changes in ji05t001.c. (And fixed typo in the new test)

More below ...

On 6/07/2018 7:55 AM, Chris Plummer wrote:
Hi David,

Solaris problems aside, overall it looks fine. Some minor things I noted:

I noticed that exitCode is never modified in agentA() or agentB(), so there isn't much point to having it. If you reach the bottom of the function, it passed, so PASSED can be returned. The code would be more clear if it did this. As-is it is implied that you can reach the bottom when it fails.

I resisted any and all urges to do any kind of unrelated code cleanup in the tests - once you start you may end up doing a full rewrite.

Is detaching the threads along the failure paths really needed? exit() is called, so this would seem to make it unnecessary.

You're right that isn't necessary. I'll remove the changes from before the exits in ji05t001.c

I prefer assignments not to be embedded inside the "if" condition. The DetachCurrentThread code in THREAD_return() is much more readable than the similar code in agentA() and agentB().

It's an existing style already used in that test e.g.

  287     if ((res =
  288             JNI_ENV_PTR(vm)->AttachCurrentThread(
  289                 JNI_ENV_ARG(vm, (void **) &env), (void *) 0)) != 0) {

and I don't mind it, so I'd prefer not to change it.

In the test:

   54         // Generally as long as we don't crash of throw unexpected    55         // exceptions then the test passes. In some cases we know exactly

"of" should be "or".

Well spotted. Thanks.

Shouldn't you be catching exceptions for all the Thread methods you are calling? Otherwise the test will exit if one is thrown, and the above comment indicates that you don't want this.

I'm not expecting there to be any exceptions from any of the called methods. That would potentially indicate a problem in handling the terminated native thread, so would indicate a test failure.

Don't we normally put these tests in a package?

Doesn't seem to be any hard and fast rule. I only uses packages when they are important for the test. In runtime we have 905 java files and only 116 have a package statement. It varies elsewhere.

Thanks,
David

thanks,

Chris

On 7/5/18 2:58 AM, David Holmes wrote:
<sigh> Solaris compiler complains about doing a return from inside a do-while loop. I'll have to rework part of the fix tomorrow.

David

On 5/07/2018 6:19 PM, David Holmes wrote:
Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/

Problem:

The tests create native threads that attach to the VM through JNI_AttachCurrentThread but which then terminate without detaching themselves. When the VM exits and we're using Flight Recorder "dumponexit" this leads to a call to VM_PrintThreads that in part wants to print the per-thread CPU usage. When we encounter the threads that have terminated already the low level pthread_getcpuclockid calls returns ESRCH but the code doesn't expect that and so fails an assert in debug mode and can SEGV in product mode.

Solution:

Serviceability-side: fix the tests

Change the tests so that the threads detach before terminating. The two tests are (surprisingly) written in completely different styles, so the solution also takes on two different styles.

Runtime-side: make the VM more robust in the fact of JNI attached threads that terminate before detaching, and add a regression test

I took a good look at the low-level code for interacting with arbitrary threads and as far as I can see the problem only exists for this one case of pthread_getcpuclockid on Linux. Elsewhere the potential for a library call failure just reports an error value (such as -1 for the cpu time used).

So the fix is simply to allow for ESRCH when calling pthread_getcpuclockid and return -1 for the cpu usage in that case.

I created a new regression test to create a new native thread, attach it and then let it terminate while still attached. The java code then calls various Thread and ThreadMXBean functions on it to ensure there are no crashes or unexpected exceptions.

Testing:
  - old tests with fixed run-time
  - old run-time with fixed tests
  - mach tier4 (which exposed the problem - that's where we enable Flight recorder for the tests) [in progress]
  - mach5 tier 1-3 for good measure [in progress]
  - new regression test

Thanks,
David





Reply via email to