On 15/08/18 20:52, Mark Thomas wrote: > On 15/08/18 20:43, Scott Evans wrote: >> Hi, >> >> Our system is on Apache Tomcat Version 8.0.47. >> OS is Windows Server 2012 R2 Datacenter. >> >> We are looking for someone that may be interested in paid contract work to >> assist with troubleshooting and resolving a Tomcat clustering issue in our >> system. >> >> The system is composed of multiple Java PrimeFaces applications running in >> a clustered Tomcat environment which is experiencing occasional >> deadlocking issues from an unknown source requiring the Nodes to be cycled >> in order to resolve. The issue is only occurring in our Production >> environment and we've determined that the issues are occurring at random >> with the replication threads. >> >> We would need someone to help investigate our configuration and determine >> if there are any further changes that can be made to our system to catch >> these deadlock issues before they occur (requiring a Node cycle). >> >> Please let me know if you or someone you know may be interested or if you >> have further questions I can help answer. > > If you can provide a thread dump of the deadlock when it occurs we can > probably help you here for free.
Scott provided me with a sanitised copy of the thread-dump off-line. I'm sharing my analysis with the list (with Scott's permission) as I think the root cause is likely to be of wider interest. There was, indeed, a deadlock. The issues was follows. The application is using JSF. Specifically, the Mojarra implementation from Oracle. There are multiple concurrent requests for the same session. Each request is processed by a dedicated thread (this is mandated by the Servlet spec although it may not be expressed that way). The threads in question are: A. ajp-apr-8009-exec-9005 B. ajp-apr-8009-exec-9000 Thread A is in the middle of processing a request. It is evaluating some EL which requires access to the view map which in turn causes the ViewMap to update the session. com.sun.faces.application.view.ViewScopeManager.processEvent locks the ViewMap object. It then tries to update the session. To do this it requires the session lock. Thread A is waiting for this lock. Thread B is at the end of a request. The session has been updated and it is attempting to write the updated session attributes to the cluster. The session lock has been obtained. The individual attributes are being written. The code has reached the ViewMap object. In order to write this object, the ViewMap object must be locked. Thread B is waiting for this lock. So, thread A holds the lock that thread B wants and is waiting for the lock thread B is holding. Thread B holds the lock the thread A wants and is waiting for the lock thread A is holding. Deadlock. This is, in essence, cause by a combination of how Tomcat's clustering is designed and Mojarra is implemented. The application is using the BackupManager. I assume with sticky sessions. Therefore, I would expect session failover between nodes to be a rare event. My recommendation is to investigate excluding the ViewMap from the replication via sessionAttributeNameFilter. You'd need a regular expression that matched anything except "com.sun.faces.application.view.activeViewMaps" I don't know how integral this object is to Mojarra. Mojarra may simply recreate this object if required. If not, you may need to trigger recreation after failover. I don't know how feasible this solution is. This will require some testing and possibly code changes. Has anyone on the users list come across this problem before? If so, how have you solved it? Suggestions for alternative solutions also welcome. Mark --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org