We run a cluster of Tomcat servers with Apache as a front end load balancer using mod_jk configured for sticky sessions. Our primary application provides users with access to their financial accounts. Fast response times are as important as session replication.

We are starting to have problems with response times. The application becomes virtually unresponsive. Based on research into how our clustering is currently set up, I believe the problem is that the servers are tied up replicating session data. We have twelve instances of Tomcat spread across three servers (9 in the production cluster, three in the test cluster). Here is our current cluster definition (the only values that vary are the tcpListenAddress and tcpListenPort):

      <Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
managerClassName="org.apache.catalina.cluster.session.DeltaManager"
               expireSessionsOnShutdown="false"
               useDirtyFlag="true"
               notifyListenersOnReplication="true">

          <Membership
              className="org.apache.catalina.cluster.mcast.McastService"
              mcastAddr="228.0.0.4"
              mcastPort="45564"
              mcastFrequency="500"
              mcastDropTime="3000"/>

          <Receiver
className="org.apache.catalina.cluster.tcp.ReplicationListener"
              tcpListenAddress="10.9.100.2"
              tcpListenPort="4021"
              tcpSelectorTimeout="100"
              tcpThreadCount="6"/>

          <Sender
className="org.apache.catalina.cluster.tcp.ReplicationTransmitter"
              replicationMode="pooled"
              ackTimeout="15000"
              waitForAck="true"/>

<Valve className="org.apache.catalina.cluster.tcp.ReplicationValve" filter=".*\.gif;.*\.js;.*\.jpg;.*\.png;.*\.htm;.*\.html;.*\.css;.*\.txt;"/>

<ClusterListener className="org.apache.catalina.cluster.session.ClusterSessionListener"/>
      </Cluster>

Based on what I have found in my research, It seems we need to either A) continue with replicationMode="pooled" and increase the tcpThreadCount substantially or B) switch to replicationMode="fastasyncqueue" with a tcpTheadCount of "8". I would prefer to continue to use "pooled" to provide the best failover if we should need to stop or restart a cluster instance. However, I cannot afford to have the application "disappear" from the end user's perspective, due to session replication demands.


Questions for a pooled cluster:
• How high should the tcpThreadCount be set? Should the value be related to the average number of sessions? • Should the ackTimeout be altered to help prevent the application from getting stuck doing replication?


Questions for a fastasyncqueue cluster:
• The javadocs for FastAsyncSocketSender say to "Limit the queue lock contention under high load!" How? • They also say "after one minute idle time, or number of request (100) the connection is reconnected with next request. Change this for production use!" Change it higher or lower? Should the value be related to the average number of sessions? • Another concern is the comment about the ackTimeout default of 15 seconds is "very low for big all session replication messages after restart a node". That description seems to accurately describe our servers. I was concerned that this value was might be too high for our servers in a "pooled"

Any other helpful suggestions will be greatly appreciated!


Thanks!


Mark

_________________________________________________________________
Need a break? Find your escape route with Live Search Maps. http://maps.live.com/default.aspx?ss=Restaurants~Hotels~Amusement%20Park&cp=33.832922~-117.915659&style=r&lvl=13&tilt=-90&dir=0&alt=-1000&scene=1118863&encType=1&FORM=MGAC01


---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to