DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=28259>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=28259 Synchronous cluster very slow - select() bug Summary: Synchronous cluster very slow - select() bug Product: Tomcat 5 Version: 5.0.21 Platform: Sun OS/Version: Solaris Status: NEW Severity: Normal Priority: Other Component: Catalina:Cluster AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] CC: [EMAIL PROTECTED] In a TC cluster with synchronous replication there is a bug in using select() and registering operations on selectors. If I use a high tcpSelectorTimeout (e.g. 1000ms) I can observe the secondary node waiting in the midlle of a request until the timeout occurs, although replication data is available. As a consequence requests take 1 second to work on, although only the TC instances need only 20ms to process the request. Nearly 1 second is spend waiting for select() to return in the middle of processing the request. I assume this is a bug in the logic handling the registration of operations on the selector? It is not a problem of high load, one can observe using single requests, but it is necessary to use session atributes in the request. I expect the same problem to occur in pooled mode. The problem does not increase response times in asynchronous mode, but I expect, that it will increase replication time. Here is a protocol of what is happening. Most of these log lines are non standard, I added them to the code, to find out the flow of control in this situation. The intersting thing happens between step 5 and 6. 1) Node 1: Request coming in at 12:39:32,563 : 2004-04-07 12:39:32,563 (http-21080-Processor25) org.apache.catalina.core.StandardWrapper/DEBUG: Returning non-STM instance 2004-04-07 12:39:32,564 (http-21080-Processor25) org.apache.jasper.servlet.JspServlet/DEBUG: JspEngine --> /repit.jsp 2004-04-07 12:39:32,564 (http-21080-Processor25) org.apache.jasper.servlet.JspServlet/DEBUG: ServletPath: /repit.jsp 2004-04-07 12:39:32,564 (http-21080-Processor25) org.apache.jasper.servlet.JspServlet/DEBUG: PathInfo: null 2004-04-07 12:39:32,564 (http-21080-Processor25) org.apache.jasper.servlet.JspServlet/DEBUG: RealPath: /opt/iob3_tomcat_a/webapps/ROOT/repit.jsp 2004-04-07 12:39:32,564 (http-21080-Processor25) org.apache.jasper.servlet.JspServlet/DEBUG: RequestURI: /repit.jsp 2004-04-07 12:39:32,564 (http-21080-Processor25) org.apache.jasper.servlet.JspServlet/DEBUG: QueryString: null 2004-04-07 12:39:32,564 (http-21080-Processor25) org.apache.jasper.servlet.JspServlet/DEBUG: Request Params: 2) Node 1: New session message EVT_SESSION_CREATED is being send at 12:39:32,567: 2004-04-07 12:39:32,566 (http-21080-Processor25) org.apache.catalina.cluster.session.DeltaManager/DEBUG: Sending EVT_SESSION_CREATED: Sending creation of Session with id=CD25AAD154CC01843779EA36B143D101.mkbtomcat1 isValid=true isNew=true accessCount=0 maxInactiveInterval=900 isPrimarySession=true creationTime=1081334372565 lastAccessedTime=1081334372565 lastTimeReplicated=1081334372565 2004-04-07 12:39:32,566 (http-21080-Processor25) org.apache.catalina.cluster.tcp.SimpleTcpCluster/DEBUG: Starting CatalinaCluster.send for msg 1 session CD25AAD154CC01843779EA36B143D101.mkbtomcat1 and destination null 3) Node 2: Message is being received at 12:39:32,568, sending ack at 12:39:32,573: 2004-04-07 12:39:32,568 (ClusterReceiver) org.apache.catalina.cluster.tcp.ReplicationListener/DEBUG: ReplicationListener listen in while doListen after select returned n=1 2004-04-07 12:39:32,568 (ClusterReceiver) org.apache.catalina.cluster.tcp.ReplicationListener/DEBUG: ReplicationListener woke up in listen 2004-04-07 12:39:32,568 (ClusterReceiver) org.apache.catalina.cluster.tcp.ReplicationListener/DEBUG: ReplicationListener calling serviceChannel from readDataFromSocket in TcpReplicationThread 2004-04-07 12:39:32,568 (ClusterReceiver) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread calling notify in serviceChannel 2004-04-07 12:39:32,568 (ClusterReceiver) org.apache.catalina.cluster.tcp.ReplicationListener/DEBUG: ReplicationListener listen in while doListen before select with tcpSelectorTimeout=1000 2004-04-07 12:39:32,569 (p.TcpReplicationThread[2]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread woke up with key [EMAIL PROTECTED] 2004-04-07 12:39:32,569 (p.TcpReplicationThread[2]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread woke up and calling drainChannel for key [EMAIL PROTECTED] 2004-04-07 12:39:32,569 (p.TcpReplicationThread[2]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread Starting drainChannel for key [EMAIL PROTECTED] 2004-04-07 12:39:32,569 (p.TcpReplicationThread[2]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread drainChannel inner loop has count 401 and pkgcnt 1 2004-04-07 12:39:32,571 (p.TcpReplicationThread[2]) org.apache.catalina.cluster.session.DeltaManager/DEBUG: Manager () Received SessionMessage of type=SESSION-MODIFIED from org.apache.catalina.cluster.mcast.McastMember [tcp://10.254.65.100:5002,10.254.65.100,5002, alive=230840] 2004-04-07 12:39:32,572 (p.TcpReplicationThread[2]) org.apache.catalina.cluster.session.DeltaManager/DEBUG: Created a DeltaSession with Id[A981679B5BD40B418ED57F874DF8E1ED.mkbtomcat2] Total count=5 2004-04-07 12:39:32,573 (p.TcpReplicationThread[2]) org.apache.catalina.cluster.session.DeltaManager/DEBUG: Receiving EVT_SESSION_CREATED: Overwriting new Session with id=CD25AAD154CC01843779EA36B143D101.mkbtomcat1 isValid=true isNew=false accessCount=0 maxInactiveInterval=900 isPrimarySession=false creationTime=1081334372572 lastAccessedTime=1081334372572 lastTimeReplicated=1081334372571 2004-04-07 12:39:32,573 (p.TcpReplicationThread[2]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread drainChannel has pkgcnt 1 2004-04-07 12:39:32,573 (p.TcpReplicationThread[2]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread drainChannel sending ack for pkgcnt 1 Important: ClusterReceiver calls select() again at 12:39:32,568. This is the place where now all the time goes by. 4) Node 1: Received ack and proceeds with session.access() at 12:39:32,573. Ends session.access() at 12:39:32,574: 2004-04-07 12:39:32,573 (http-21080-Processor25) org.apache.catalina.cluster.session.DeltaManager/DEBUG: Created a DeltaSession with Id[CD25AAD154CC01843779EA36B143D101.mkbtomcat1] Total count=5 2004-04-07 12:39:32,574 (http-21080-Processor25) org.apache.catalina.cluster.session.DeltaManager/INFO : End of session.access in id=CD25AAD154CC01843779EA36B143D101.mkbtomcat1 isNew=false lastAccessedTime=1081334372565 thisAccessedTime=1081334372574 accessCount=1 5) Node 1: Proceeds working on request until Delta message of type EVT_SESSION_DELTA is being sent at 12:39:32,578: 2004-04-07 12:39:32,576 (http-21080-Processor25) org.apache.catalina.cluster.tcp.ReplicationValve/DEBUG: Invoking replication request on /repit.jsp 2004-04-07 12:39:32,577 (http-21080-Processor25) org.apache.catalina.cluster.session.DeltaManager/DEBUG: DeltaManager request Completed sending EVT_SESSION_DELTA id=CD25AAD154CC01843779EA36B143D101.mkbtomcat1 isValid=true isNew=false accessCount=1 maxInactiveInterval=900 isPrimarySession=true creationTime=1081334372565 lastAccessedTime=1081334372565 lastTimeReplicated=1081334372565 2004-04-07 12:39:32,577 (http-21080-Processor25) org.apache.catalina.cluster.session.DeltaManager/DEBUG: DeltaManager request Completed returns with msg non null 2004-04-07 12:39:32,577 (http-21080-Processor25) org.apache.catalina.cluster.tcp.SimpleTcpCluster/DEBUG: Starting CatalinaCluster.send for msg 13 session CD25AAD154CC01843779EA36B143D101.mkbtomcat1 and destination null 2004-04-07 12:39:32,577 (http-21080-Processor25) org.apache.catalina.cluster.tcp.SimpleTcpCluster/DEBUG: Setting msgID in CatalinaCluster.send for msg 13 session CD25AAD154CC01843779EA36B143D101.mkbtomcat1 and destination null to ABC9CBA 2004-04-07 12:39:32,578 (http-21080-Processor25) org.apache.catalina.cluster.tcp.SimpleTcpCluster/DEBUG: Calling clusterSender.sendMessage for msg 13 session CD25AAD154CC01843779EA36B143D101.mkbtomcat1 ID ABC9CBA and destination NULL 6) Now there is no activity for 993 milliseconds - which is the problem - until 1003 milliseconds after the last call to select() in ClusterReceiver on node 2, although there is one message of type EVT_SESSION_DELTA waiting to be read. Finally the select returns at 12:39:33,571 but with n=0 and another immediate call to select () returns instantaneously with n=1. From here on everything proceeds normal, the message is being read and ack is sent at 12:39:33,577: 2004-04-07 12:39:33,571 (ClusterReceiver) org.apache.catalina.cluster.tcp.ReplicationListener/DEBUG: ReplicationListener listen in while doListen after select returned n=0 2004-04-07 12:39:33,571 (ClusterReceiver) org.apache.catalina.cluster.tcp.ReplicationListener/DEBUG: ReplicationListener listen in while doListen before select with tcpSelectorTimeout=1000 2004-04-07 12:39:33,571 (ClusterReceiver) org.apache.catalina.cluster.tcp.ReplicationListener/DEBUG: ReplicationListener listen in while doListen after select returned n=1 2004-04-07 12:39:33,571 (ClusterReceiver) org.apache.catalina.cluster.tcp.ReplicationListener/DEBUG: ReplicationListener wokeup in listen 2004-04-07 12:39:33,571 (ClusterReceiver) org.apache.catalina.cluster.tcp.ReplicationListener/DEBUG: ReplicationListener calling serviceChannel from readDataFromSocket in TcpReplicationThread 2004-04-07 12:39:33,571 (ClusterReceiver) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread calling notify in serviceChannel 2004-04-07 12:39:33,571 (ClusterReceiver) org.apache.catalina.cluster.tcp.ReplicationListener/DEBUG: ReplicationListener listen in while doListen before select with tcpSelectorTimeout=1000 2004-04-07 12:39:33,571 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread woke up with key [EMAIL PROTECTED] 2004-04-07 12:39:33,572 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread woke up and calling drainChannel for key [EMAIL PROTECTED] 2004-04-07 12:39:33,572 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread Starting drainChannel for key [EMAIL PROTECTED] 2004-04-07 12:39:33,572 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread drainChannel inner loop has count 540 and pkgcnt 1 2004-04-07 12:39:33,574 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.tcp.SimpleTcpCluster/DEBUG: Received in clusterSender.messageDataReceived msg 13 session CD25AAD154CC01843779EA36B143D101.mkbtomcat1 2004-04-07 12:39:33,574 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.session.DeltaManager/DEBUG: Manager () Received SessionMessage of type=SESSION-DELTA from org.apache.catalina.cluster.mcast.McastMember [tcp://10.254.65.100:5002,10.254.65.100,5002, alive=230851] 2004-04-07 12:39:33,576 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.session.DeltaManager/INFO : End of session.access in id=CD25AAD154CC01843779EA36B143D101.mkbtomcat1 isNew=false lastAccessedTime=1081334372572 thisAccessedTime=1081334373575 accessCount=1 2004-04-07 12:39:33,576 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.session.DeltaManager/INFO : End of session.endAccess in id=CD25AAD154CC01843779EA36B143D101.mkbtomcat1 isNew=false lastAccessedTime=1081334372572 thisAccessedTime=1081334373575 accessCount=0 2004-04-07 12:39:33,577 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.session.DeltaManager/DEBUG: Receiving EVT_SESSION_DELTA: Updating Session with id=CD25AAD154CC01843779EA36B143D101.mkbtomcat1 isValid=true isNew=false accessCount=0 maxInactiveInterval=900 isPrimarySession=false creationTime=1081334372572 lastAccessedTime=1081334372572 lastTimeReplicated=1081334372571 2004-04-07 12:39:33,577 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread drainChannel has pkgcnt 1 2004-04-07 12:39:33,577 (p.TcpReplicationThread[1]) org.apache.catalina.cluster.tcp.TcpReplicationThread/DEBUG: TcpReplicationThread drainChannel sending ack for pkgcnt 1 7) The ack is seen on node 1 at 12:39:33,579 and session.endAccess() ends the processing of the request 2004-04-07 12:39:33,579 (http-21080-Processor25) org.apache.catalina.cluster.session.DeltaManager/INFO : End of session.endAccess in id=CD25AAD154CC01843779EA36B143D101.mkbtomcat1 isNew=false lastAccessedTime=1081334372565 thisAccessedTime=1081334372574 accessCount=0 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]