On 15/08/18 20:52, Mark Thomas wrote:
> On 15/08/18 20:43, Scott Evans wrote:
>> Hi,
>>
>> Our system is on Apache Tomcat Version 8.0.47.
>> OS is Windows Server 2012 R2 Datacenter.
>>
>> We are looking for someone that may be interested in paid contract work to
>> assist with troubleshooting and resolving a Tomcat clustering issue in our
>> system.
>>
>> The system is composed of multiple Java PrimeFaces applications running in
>> a clustered Tomcat environment which is experiencing occasional
>> deadlocking issues from an unknown source requiring the Nodes to be cycled
>> in order to resolve.  The issue is only occurring in our Production
>> environment and we've determined that the issues are occurring at random
>> with the replication threads.
>>
>> We would need someone to help investigate our configuration and determine
>> if there are any further changes that can be made to our system to catch
>> these deadlock issues before they occur (requiring a Node cycle).
>>
>> Please let me know if you or someone you know may be interested or if you
>> have further questions I can help answer.
> 
> If you can provide a thread dump of the deadlock when it occurs we can
> probably help you here for free.

Scott provided me with a sanitised copy of the thread-dump off-line. I'm
sharing my analysis with the list (with Scott's permission) as I think
the root cause is likely to be of wider interest.

There was, indeed, a deadlock.

The issues was follows.

The application is using JSF. Specifically, the Mojarra implementation
from Oracle.

There are multiple concurrent requests for the same session.

Each request is processed by a dedicated thread (this is mandated by the
Servlet spec although it may not be expressed that way).

The threads in question are:

A. ajp-apr-8009-exec-9005
B. ajp-apr-8009-exec-9000

Thread A is in the middle of processing a request. It is evaluating some
EL which requires access to the view map which in turn causes the
ViewMap to update the session.
com.sun.faces.application.view.ViewScopeManager.processEvent locks the
ViewMap object. It then tries to update the session. To do this it
requires the session lock. Thread A is waiting for this lock.

Thread B is at the end of a request. The session has been updated and it
is attempting to write the updated session attributes to the cluster.
The session lock has been obtained. The individual attributes are being
written. The code has reached the ViewMap object. In order to write this
object, the ViewMap object must be locked. Thread B is waiting for this
lock.

So, thread A holds the lock that thread B wants and is waiting for the
lock thread B is holding. Thread B holds the lock the thread A wants and
is waiting for the lock thread A is holding. Deadlock.

This is, in essence, cause by a combination of how Tomcat's clustering
is designed and Mojarra is implemented.

The application is using the BackupManager. I assume with sticky
sessions. Therefore, I would expect session failover between nodes to be
a rare event.

My recommendation is to investigate excluding the ViewMap from the
replication via sessionAttributeNameFilter. You'd need a regular
expression that matched anything except
"com.sun.faces.application.view.activeViewMaps"
I don't know how integral this object is to Mojarra. Mojarra may simply
recreate this object if required. If not, you may need to trigger
recreation after failover. I don't know how feasible this solution is.
This will require some testing and possibly code changes.

Has anyone on the users list come across this problem before? If so, how
have you solved it? Suggestions for alternative solutions also welcome.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to