On 7/14/16 12:46 PM, Ivan Gerasimov wrote:
Thank you David for looking into this!

Here's the webrev updated in accordance with your and Daniel's suggestions:
http://cr.openjdk.java.net/~igerasim/8160892/01/webrev/

src/os/windows/vm/os_windows.cpp
    L3900:     bool registered = false;
        Please consider adding a comment above this variable:

        // We only attempt to register threads until a process exiting
        // thread manages to set the process_exiting flag. Any threads
        // that come through here after the process_exiting flag is set
        // are unregistered and will be caught in the SuspendThread()
        // infinite loop below.

        My request is different than David's. I prefer the comment up
        here by the 'registered' variable because that variable is
        key for getting into the SuspendThread() infinite loop. The
        comment also occurs before all the code paths that you have
        trace out with the different Ept types and flags states.

I'm really hoping that this is the last tweak we need to make to
clearly comment what we're doing to alleviate this OS race.
The number of occurrences of the underlying bug keep going down
with every iteration so we're getting close.

Dan



Please see my answers inline


Nit: can we change 'registered_itself" to just "registered" please.
Done.


Can you explain under what conditions a thread will now reach the self-suspension code. Is that only if an error occurred such that we were unable to register our handle for the process-exiting thread to wait on? If so some commentary on that block seems appropriate - perhaps more appropriate there than back up at the place where it failed to get the handle (as Dan requested).

There are three kinds of threads, which can be caught in that self-suspension loop: 1) All threads that want to end (by calling _endthreadex()) *after* some process-exiting thread raised the flag `process_exiting`. The rationale here is that we know that the whole process is going to be terminated quite soon, so we do not allow any thread to call _endthreadex(), which seems to have the concurrency bug. 2) Any thread that wants to end the whole process, after some other thread raised the flag `process_exiting`. If more than one threads want to end the process, we let to do it only the thread that could raise the flag `process_exiting`. All other such threads will have to suspend themselves. 3) (Unlikely to happen in practice) Any thread that wants to end by calling _endthreadex(), but which failed to register itself due to failure of DuplicateHandle(). Here we still have a race, which can result in a wrong exit code of the process.

Given we keep missing conditions I'm only cautiously optimistic about this. And I'd like to understand how we still sometimes end up exiting with an "error code" that seems to be the value of an exception! :(

The last time the sentinel exit code =20115 was reported almost a year ago. After that the fix for JDK-8145127 had gone in, and I didn't see any more reports about wrong exit codes since then. In particular, that fix worked around the situation when more than one threads concurrently call System.exit(), which might have caused a race.

With kind regards,
Ivan


Thanks,
David

With kind regards,
Ivan




Reply via email to