Adding back runtime
On 10/07/2018 8:45 AM, Alex Menkov wrote:
+1
Thanks for looking at this Alex!
couple minor notes (no need to resend review)
Webrev updated in place (v3) for others to see.
src/hotspot/os/linux/os_linux.cpp
please replace
5581 }
5582 else {
with
} else {
Done.
test/hotspot/jtreg/runtime/jni/terminatedThread/libterminatedThread.c
please fix error reporting (I suppose you mean "TEST ERROR:
pthread_create failed"/"TEST ERROR: pthread_join failed"):
85 if ((res = pthread_create(&thread, NULL, thread_start, NULL)) !=
0) {
86 fprintf(stderr, "TEST ERROR: pthread_created failed: %s
(%d)\n", strerror(res), res);
87 exit(1);
88 }
89
90 if ((res = pthread_join(thread, NULL)) != 0) {
91 fprintf(stderr, "TEST ERROR: pthread_created failed: %s
(%d)\n", strerror(res), res);
92 exit(1);
93 }
Fixed - well spotted!
Thanks,
David
--alex
On 07/09/2018 15:17, David Holmes wrote:
Thanks Chris!
Can I please get a second review.
David
On 10/07/2018 7:50 AM, Chris Plummer wrote:
On 7/9/18 2:41 PM, David Holmes wrote:
Hi Chris,
On 10/07/2018 4:22 AM, Chris Plummer wrote:
Hi David,
Would it be better to problem list this test on solaris using
JDK-8156708. That way when JDK-8156708 is fixed it can come off the
problem list and start executing on solaris.
JDK-8156708 is already fixed - it's a dup of JDK-8154715. We could
only fix this for VM created threads. The general problem of TLS
destructors looping if a thread terminates without detaching from
the VM is not solvable - other than by not using TLS in the VM.
Ok, I misunderstood your comments in the test.
Changes look fine.
Chris
Thanks,
David
thanks,
Chris
On 7/8/18 4:58 PM, David Holmes wrote:
tl;dr skip the new regression test on Solaris
New webrev:
http://cr.openjdk.java.net/~dholmes/8205878/webrev.v3/
This excludes the test from running on Solaris, so the makefile
doesn't bother compiling this native test and the Java part of the
test adds:
! * @requires os.family != "windows" & os.family != "solaris"
* @summary Basic test of Thread and ThreadMXBean queries on a
natively
* attached thread that has failed to detach before
terminating.
+ * @comment The native code only supports POSIX so no windows
testing; also
+ * we have to skip solaris as a terminating thread that
fails to
+ * detach will hit an infinite loop due to TLS
destructor issues - see
+ * comments in JDK-8156708
Note this means that Solaris is not affected by the original issue
because a still-attached native thread can't actually terminate
due to the TLS destructor infinite-loop issue.
Thanks,
David
On 6/07/2018 6:07 PM, David Holmes wrote:
<sigh> The new test is hanging on Solaris. I just discovered we
don't run these tests on Solaris until tier4.
David
On 6/07/2018 8:40 AM, David Holmes wrote:
Hi Chris,
Thanks for looking at this.
Updated webrev:
http://cr.openjdk.java.net/~dholmes/8205878/webrev.v2/
Only real changes in ji05t001.c. (And fixed typo in the new test)
More below ...
On 6/07/2018 7:55 AM, Chris Plummer wrote:
Hi David,
Solaris problems aside, overall it looks fine. Some minor
things I noted:
I noticed that exitCode is never modified in agentA() or
agentB(), so there isn't much point to having it. If you reach
the bottom of the function, it passed, so PASSED can be
returned. The code would be more clear if it did this. As-is it
is implied that you can reach the bottom when it fails.
I resisted any and all urges to do any kind of unrelated code
cleanup in the tests - once you start you may end up doing a
full rewrite.
Is detaching the threads along the failure paths really needed?
exit() is called, so this would seem to make it unnecessary.
You're right that isn't necessary. I'll remove the changes from
before the exits in ji05t001.c
I prefer assignments not to be embedded inside the "if"
condition. The DetachCurrentThread code in THREAD_return() is
much more readable than the similar code in agentA() and agentB().
It's an existing style already used in that test e.g.
287 if ((res =
288 JNI_ENV_PTR(vm)->AttachCurrentThread(
289 JNI_ENV_ARG(vm, (void **) &env), (void *)
0)) != 0) {
and I don't mind it, so I'd prefer not to change it.
In the test:
54 // Generally as long as we don't crash of throw
unexpected
55 // exceptions then the test passes. In some cases
we know exactly
"of" should be "or".
Well spotted. Thanks.
Shouldn't you be catching exceptions for all the Thread methods
you are calling? Otherwise the test will exit if one is thrown,
and the above comment indicates that you don't want this.
I'm not expecting there to be any exceptions from any of the
called methods. That would potentially indicate a problem in
handling the terminated native thread, so would indicate a test
failure.
Don't we normally put these tests in a package?
Doesn't seem to be any hard and fast rule. I only uses packages
when they are important for the test. In runtime we have 905
java files and only 116 have a package statement. It varies
elsewhere.
Thanks,
David
thanks,
Chris
On 7/5/18 2:58 AM, David Holmes wrote:
<sigh> Solaris compiler complains about doing a return from
inside a do-while loop. I'll have to rework part of the fix
tomorrow.
David
On 5/07/2018 6:19 PM, David Holmes wrote:
Bug: https://bugs.openjdk.java.net/browse/JDK-8205878
Webrev: http://cr.openjdk.java.net/~dholmes/8205878/webrev/
Problem:
The tests create native threads that attach to the VM through
JNI_AttachCurrentThread but which then terminate without
detaching themselves. When the VM exits and we're using
Flight Recorder "dumponexit" this leads to a call to
VM_PrintThreads that in part wants to print the per-thread
CPU usage. When we encounter the threads that have terminated
already the low level pthread_getcpuclockid calls returns
ESRCH but the code doesn't expect that and so fails an assert
in debug mode and can SEGV in product mode.
Solution:
Serviceability-side: fix the tests
Change the tests so that the threads detach before
terminating. The two tests are (surprisingly) written in
completely different styles, so the solution also takes on
two different styles.
Runtime-side: make the VM more robust in the fact of JNI
attached threads that terminate before detaching, and add a
regression test
I took a good look at the low-level code for interacting with
arbitrary threads and as far as I can see the problem only
exists for this one case of pthread_getcpuclockid on Linux.
Elsewhere the potential for a library call failure just
reports an error value (such as -1 for the cpu time used).
So the fix is simply to allow for ESRCH when calling
pthread_getcpuclockid and return -1 for the cpu usage in that
case.
I created a new regression test to create a new native
thread, attach it and then let it terminate while still
attached. The java code then calls various Thread and
ThreadMXBean functions on it to ensure there are no crashes
or unexpected exceptions.
Testing:
- old tests with fixed run-time
- old run-time with fixed tests
- mach tier4 (which exposed the problem - that's where we
enable Flight recorder for the tests) [in progress]
- mach5 tier 1-3 for good measure [in progress]
- new regression test
Thanks,
David