Thank you David for looking into this!
Here's the webrev updated in accordance with your and Daniel's suggestions:
http://cr.openjdk.java.net/~igerasim/8160892/01/webrev/
Please see my answers inline
Nit: can we change 'registered_itself" to just "registered" please.
Done.
Can you explain under what conditions a thread will now reach the
self-suspension code. Is that only if an error occurred such that we
were unable to register our handle for the process-exiting thread to
wait on? If so some commentary on that block seems appropriate -
perhaps more appropriate there than back up at the place where it
failed to get the handle (as Dan requested).
There are three kinds of threads, which can be caught in that
self-suspension loop:
1) All threads that want to end (by calling _endthreadex()) *after* some
process-exiting thread raised the flag `process_exiting`.
The rationale here is that we know that the whole process is going to be
terminated quite soon, so we do not allow any thread to call
_endthreadex(), which seems to have the concurrency bug.
2) Any thread that wants to end the whole process, after some other
thread raised the flag `process_exiting`.
If more than one threads want to end the process, we let to do it only
the thread that could raise the flag `process_exiting`. All other such
threads will have to suspend themselves.
3) (Unlikely to happen in practice) Any thread that wants to end by
calling _endthreadex(), but which failed to register itself due to
failure of DuplicateHandle().
Here we still have a race, which can result in a wrong exit code of the
process.
Given we keep missing conditions I'm only cautiously optimistic about
this.
And I'd like to understand how we still sometimes end up exiting with
an "error code" that seems to be the value of an exception! :(
The last time the sentinel exit code =20115 was reported almost a year ago.
After that the fix for JDK-8145127 had gone in, and I didn't see any
more reports about wrong exit codes since then.
In particular, that fix worked around the situation when more than one
threads concurrently call System.exit(), which might have caused a race.
With kind regards,
Ivan
Thanks,
David
With kind regards,
Ivan