Hello,

Using Tomcat 8.0.23 and Tomcat 8.0.30 with Java 1.7.0_25 on CentOS 5.11 we are getting a stuck thread issue when our server is on high(er) load. It seems to happen when one of our non-Container Threads invokes the AsyncContext.complete() method while the AsyncStateMachine class’ state class member is set to STARTING instead of STARTED, resulting in the invocation of the Object’s wait() method in the pauseNonContainerThread() method. It never seems to recover from this.

I tried a couple of things within our source code trying to get around this:

1. If HttpServletRequest.isAsyncStarted() returns true invoke
   AsyncContext.complete(), but if it returns false sleep for an
   arbitrary time and try again. Even if
   HttpServletRequest.isAsyncStarted() returned true and I invoke
   AsyncContext.complete() on occasion we still get the described
   problem. I'm not sure if the HttpServletRequest.isAsyncStarted()
   correlates with the AsyncStateMachine class’ state class member.
2. I tried a similar approach using the AsyncListener class, but again
   without consistent success.
3. I tried enforcing an arbitrary delay even before invoking the
   AsyncContext.complete() while HttpServletRequest.isAsyncStarted()
   returned true, but again without consistent success.

I also tried a change within Tomcat's source code trying to get around this:

1. Instead of invoking the Object's wait() method I tried to invoke the
   Object's wait(long timeout) method and checking if the state class
   member ever seems to change from STARTING to STARTED when in this
   situation. This didn't help either and the state class member seems
   to be stuck in STARTING.

        private synchronized void pauseNonContainerThread() {
            while (!ContainerThreadMarker.isContainerThread() &&
                    state.getPauseNonContainerThread()) {
                try {
                    System.out.println("*** pauseNonContainerThread ***
   (before wait :: state: '" + state + "')");
   *wait(500);*
                    System.out.println("*** pauseNonContainerThread ***
   (after wait)");
                } catch (InterruptedException e) {
                    // TODO Log this?
                }
            }
        }

I can replicate the issue, but it takes quite a number of load tries before I get it replicated. It also seems to be harder to replicate this on Tomcat 8.0.30 than it is on Tomcat 8.0.23. I don’t have an isolated test case (yet) that I can upload somewhere. However, I can provide logs and thread dumps if needed.

In addition, looking at the Tomcat source code, I couldn’t figure out what/who is supposed to invoke the Object's notify() method on the non-Container Thread once Tomcat decided to invoke the Object’s wait() method on it. As Tomcat decided to invoke the Object’s wait() method on it I kind of expect Tomcat to be responsible for invoking the Object's notify() method on it eventually when the right conditions are met or something. I’m not saying this is a/the problem though, I'm just trying to understand what would/should happen when you get into this state.

Regards,
Jeroen...

Reply via email to