Hi,

Our current setup is Tomcat 9.0.8 running SuSE Enterprise. This server is
running a dozen web applications built with Struts 1.3.8 with some newer
Spring applications on the horizon. There is a large user base with some
applications seeing heavy usage. Applications are currently using Java 1.7
and 1.8.

We were originally running Tomcat 7.x but were having issues with perm gen
maxing out very quickly for unknown reasons but possibly related to a buggy
third party "enterprise-grade" reporting Java library. We had to restart
the server nightly to try and keep perm gen from maxing out. Part of the
reason was this third-party library spawned immortal threads that would
prevent an application from unloading and being garbage collected when a
newer build of an application was deployed (the developers behind it never
expected the library would be run on a server with multiple
applications....). So we upgraded Tomcat to 8.5.x first and then to 9.x
recently. This fixed the perm gen issue.

Our current issue we are having is that for some unknown reason and after
seemingly random lengths of time, an application will get into a state and
will start having issues which results in failed page loads or pages not
loading correctly. According to Chrome's network tab in developer console,
a random bunch of static resources (javascript, css, images) are returning
500 errors and not being served. Whether the page loads or not depends on
exactly which resources were not returned. Every time you access any page
in that application, another random bunch of resources have 500 errors.
There's no indication in any of Tomcat's log files that an application is
in this state. The application will stay in this unusable state until it is
restarted or the server is restarted.

We've resorted to once again scheduling the server to restart nightly which
has cut down on the frequency of this happening which hints at this being
related to usage, but it is still happening once a week and sometimes more.
The applications that seem to experience this the most are I believe the
more heavily used applications.

No Spring application has experienced this issue on our other servers which
leads me to tentatively say that Spring is not affected and/or is not a
cause of the issue but upgrading all applications to Spring is not feasible
at the moment.

We've tried upgrading Struts in the most frequently affected applications
to 1.3.10 but it did not solve the issue and actually afflicted us with
another issue stemming from a bug in that Struts version. So we had to go
back to 1.3.8.

I spoke with a couple of people in Tomcat's IRC channel and they seemed to
think it was a third-party library or a problem/race condition between the
Struts and Tomcat servlets. While this may be important information, I have
no idea what to do with it.

I'm not sure debugging is a possibility because it's a remote server and I
wouldn't even know what to look for. I also can't allow a production
application to remain in this state for very long.

I can't file a bug report because I can't reproduce it at will and I am
unable to provide thread or heap dumps.

I have a suspicion it may be caused by that third part library although I
don't see how that library would affect Tomcat's serving of static
resources.

This issue has never happened to our test server or our local instances of
tomcat. Since I suspect it's related to usage, this is not surprising.

Any help would be greatly appreciated.

Chris

Reply via email to