We run a cluster of Tomcat servers with Apache as a front end load balancer
using mod_jk configured for sticky sessions. Our primary application
provides users with access to their financial accounts. Fast response times
are as important as session replication.
We are starting to have problems with response times. The application
becomes virtually unresponsive. Based on research into how our clustering is
currently set up, I believe the problem is that the servers are tied up
replicating session data. We have twelve instances of Tomcat spread across
three servers (9 in the production cluster, three in the test cluster). Here
is our current cluster definition (the only values that vary are the
tcpListenAddress and tcpListenPort):
<Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
managerClassName="org.apache.catalina.cluster.session.DeltaManager"
expireSessionsOnShutdown="false"
useDirtyFlag="true"
notifyListenersOnReplication="true">
<Membership
className="org.apache.catalina.cluster.mcast.McastService"
mcastAddr="228.0.0.4"
mcastPort="45564"
mcastFrequency="500"
mcastDropTime="3000"/>
<Receiver
className="org.apache.catalina.cluster.tcp.ReplicationListener"
tcpListenAddress="10.9.100.2"
tcpListenPort="4021"
tcpSelectorTimeout="100"
tcpThreadCount="6"/>
<Sender
className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
replicationMode="pooled"
ackTimeout="15000"
waitForAck="true"/>
<Valve
className="org.apache.catalina.cluster.tcp.ReplicationValve"
filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>
<ClusterListener
className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
</Cluster>
Based on what I have found in my research, It seems we need to either A)
continue with replicationMode="pooled" and increase the tcpThreadCount
substantially or B) switch to replicationMode="fastasyncqueue" with a
tcpTheadCount of "8". I would prefer to continue to use "pooled" to provide
the best failover if we should need to stop or restart a cluster instance.
However, I cannot afford to have the application "disappear" from the end
user's perspective, due to session replication demands.
Questions for a pooled cluster:
How high should the tcpThreadCount be set? Should the value be related to
the average number of sessions?
Should the ackTimeout be altered to help prevent the application from
getting stuck doing replication?
Questions for a fastasyncqueue cluster:
The javadocs for FastAsyncSocketSender say to "Limit the queue lock
contention under high load!" How?
They also say "after one minute idle time, or number of request (100) the
connection is reconnected with next request. Change this for production
use!" Change it higher or lower? Should the value be related to the average
number of sessions?
Another concern is the comment about the ackTimeout default of 15 seconds
is "very low for big all session replication messages after restart a node".
That description seems to accurately describe our servers. I was concerned
that this value was might be too high for our servers in a "pooled"
Any other helpful suggestions will be greatly appreciated!
Thanks!
Mark
_________________________________________________________________
Need a break? Find your escape route with Live Search Maps.
http://maps.live.com/default.aspx?ss=Restaurants~Hotels~Amusement%20Park&cp=33.832922~-117.915659&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=1118863&encType=1&FORM=MGAC01
---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]