I may have found the problem and solution. We went back and had a look at all the logs, not just catalina.out. There was a localhost log made by Tomcat and in there was a number of exceptions from when the application was in that weird state. The exceptions were as follows:
java.io.IOException: Broken pipe java.sql.SQLException: Connection has already been closed. org.hibernate.TransactionException: JDBC begin transaction failed: org.hibernate.TransactionException: Already have an associated managed connection This last one repeats every second or so until the application is restarted. This led me to a custom request filter that came from a book about Hibernate, specifically a section about transaction management. It says to create a filter and to wrap the transaction around the filter chain. The code in the filter is as follows: Session session = HibernateSessionFactory.getSession(); Transaction tx = session.beginTransaction(); try { chain.doFilter(request, response); tx.commit(); session.close(); } catch (Throwable ex) { try { if (tx.isActive()) { tx.rollback(); session.close(); } } catch(Throwable rbEx) { rbEx.printStackTrace(); } throw new ServletException(ex); } The first exception about a broken pipe indicates a connection was lost. This is probably from mobile users losing signal or something. Since the connection was lost, the connection closes somewhere, somehow. So if the connection is lost during chain.doFilter(), then when it tries tx.commit(), it fails because the connection is already closed. This falls to the catch block and I assume tx.isActive() is false at that point so session.close() is never called. The next request fails on session.beginTransaction() and then since we are forever throwing exceptions, session.close() is never called and the app can no longer fully process requests because it never makes it to chain.doFilter(). This theory follows the order of the thrown exceptions in the logs. I can reproduce the issue locally by adding a breakpoint to an action class and letting eclipse sit there for a couple of minutes. I assume the JDBC driver closes the connection because when I resume the application I'm seeing the same symptoms as the issue we've been having. With the following updated code, only one exception about the connection already being closed is thrown, but the application recovers and is still usable. I may still add a catch for that SqlException and just log a message about connection lost or something. Session session = HibernateSessionFactory.getSession(); Transaction tx = session.beginTransaction(); try { chain.doFilter(request, response); if (session.isOpen() && tx.isActive()) tx.commit(); } catch (Throwable ex) { try { if (session.isOpen() && tx.isActive()) tx.rollback(); } catch (Throwable rbEx) { rbEx.printStackTrace(); } throw new ServletException(ex); } finally { if (session.isOpen()) session.close(); } We'll deploy the fix to the two most problematic applications tomorrow and hopefully we won't see the issue anymore. Thanks guys for all commenting about the logs. That prompted me to go have a deeper look at everything Tomcat was logging. I'm still confused as to why the exceptions never made it to catalina.out? On Mon, Aug 6, 2018 at 2:25 PM Mark Thomas <ma...@apache.org> wrote: > > > On 06/08/2018 16:54, Chrifister wrote: > > <snip/> > > > Any help would be greatly appreciated. > > No real ideas. Just requests for more information and an observation. > > The 500 responses should have triggered stack traces in the logs. Can > you provide some sample stack traces of the errors you are seeing. > > What does a thread dump show? > > Are you accessing Tomcat directly or via a reverse proxy such as httpd? > > That only one app of several gets into this state while other > applications work correctly points towards an application issue rather > than anything else. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >