On 1/15/15 5:09 AM, Ivan Gerasimov wrote:
Hello everyone!

This is yet another iteration in the attempt to solve the 'wrong exit code' bug on Windows [1]. The wrong exit code has been observed once again with one of the regression tests.

The suspicion is that this might be due to the critical section had been destroyed before all the children threads were terminated. In that case, one of the threads had been unblocked and called _endthreadex(), which resulted in a race.

To address this scenario, it is proposed to make the thread that is about to exit the process raise a flag. If the flag is raised, any threads wishing to exit should instead suspend themselves.

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8069048
WEBREV: http://cr.openjdk.java.net/~igerasim/8069048/0/webrev/

src/os/windows/vm/os_windows.cpp
    line 3895: // don't let the current thread to proceed to _endthreadex()
        Typo: 'let the current thread to proceed to'
           -> 'let the current thread proceed to'

    Just making sure that I understand the revised algorithm:

    - before the EPT_PROCESS thread gets here, EPT_THREAD threads
      will work as before and call line 3909 _endthreadex()

    - after the EPT_PROCESS thread gets here and sets the flag
      on line 3886: OrderAccess::release_store(&process_exiting, 1);

      - an EPT_THREAD thread may have made it past flag check on line
3802: } else if (OrderAccess::load_acquire(&process_exiting) == 0) { but it will be blocked on line 3803: EnterCriticalSection(&crit_sect);

      - an EPT_THREAD thread that sees the flag set on line 3802 will
        drop into the self-suspend block on lines 3892-3900

    - after the EPT_PROCESS thread exits the critical section, then
      any EPT_THREAD threads that were blocked trying to acquire
      the critical section will now see the flag set on line 3805:
if (what == EPT_THREAD && OrderAccess::load_acquire(&process_exiting) == 0) {
      and drop into the self-suspend block on lines 3892-3900

    Short version: any EPT_THREAD threads that arrive after the
    EPT_PROCESS thread owns the critical section will never call
    line 3909 _endthreadex() because they self-suspend.

    OK, I concur that this new algorithm looks correct and will reduce
    the number of threads racing through line 3909 _endthreadex() while
    the EPT_PROCESS thread is trying to call exit().

    One possible hole remains that we've discussed before. If an
    EPT_THREAD thread calls _endthreadex() before the EPT_PROCESS
    thread gets here, and if the EPT_THREAD thread stalls in
    _endthreadex(), then it's still possible for that EPT_THREAD
    thread to mess up the exit code if it unblocks after the
    EPT_PROCESS thread has set the exit code. We've discussed this
    before and there's nothing we can do about other than try and
    reduce the probability by reducing the number of EPT_THREAD
    threads that are calling _endthreadex().

    Thumbs up!


Side note: A new failure of this mechanism was filed recently:

JDK-8069068 VM warning: WaitForMultipleObjects timed out (0) ...
https://bugs.openjdk.java.net/browse/JDK-8069068

    The bug was filed against JDK9-B45 so it has the most recent
    fix (https://bugs.openjdk.java.net/browse/JDK-8066863). The
    weird part is that WaitForMultipleObjects() timed out without
    an error code being set. Don't know if that means anything in
    particular in the Win* APIS...

    This fix (8069048) may also reduce the probability of this
    failure mode because we'll be queueing fewer threads on the
    handle list...

Dan



[1] https://bugs.openjdk.java.net/browse/JDK-6573254

Sincerely yours,
Ivan

Reply via email to