I haven't tested clustering on Solaris 9, but on linux it works great. There is something funky with your multicast, as you can see there are members added and disappearing all the time. Try to increase your mcastDropTime, that should keep the members in the cluster for a longer time. contact me at my apache.org email for help with debugging
Filip -----Original Message----- From: Ilyschenko, Vlad [mailto:[EMAIL PROTECTED] Sent: Sunday, February 22, 2004 5:15 PM To: [EMAIL PROTECTED] Subject: tomcat 5.0.19 cluster problem Hi, We are running three Solaris9 boxes with tomcat 5.0.19 on them. Cluster configuration is as follows: <Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster" managerClassName="org.apache.catalina.cluster.session.DeltaManager" expireSessionsOnShutdown="false" useDirtyFlag="true"> <Membership className="org.apache.catalina.cluster.mcast.McastService" mcastAddr="228.0.0.3" mcastPort="45564" mcastFrequency="500" mcastDropTime="3000"/> <Receiver className="org.apache.catalina.cluster.tcp.ReplicationListener" tcpListenAddress="auto" tcpListenPort="4001" tcpSelectorTimeout="100" tcpThreadCount="60"/> <Sender className="org.apache.catalina.cluster.tcp.ReplicationTransmitter" replicationMode="pooled"/> <Valve className="org.apache.catalina.cluster.tcp.ReplicationValve" filter=".*\.gif;.*\.js;.*\.jpg;.*\.htm;.*\.html;.*\.txt;"/> </Cluster> Yesterday tomcat on one of the servers ran out of memory that coincided with a clustered web application hang across all three servers. All tomcat instances started exhibiting cluster problems in one shape or another. I wonder if 5.0.19 cluster has memory leaks. I have not experienced OutOfMemory problems on those boxes running 5.0.16 for over a month. In any case could a cluster node that ran out of memory destroy the entire cluster? You could find the log fragments from those three boxes below: Box #1 (IP: 192.168.64.40) - the one with memory problems: 22 Feb 2004 00:26:43 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.36:4001,192.168.64.36,4001, alive=112504278] 22 Feb 2004 00:26:43 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.36: 4001,192.168.64.36,4001, alive=112532838] 22 Feb 2004 00:26:53 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.36:4001,192.168.64.36,4001, alive=112532838] 22 Feb 2004 00:26:53 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.36: 4001,192.168.64.36,4001, alive=112540488] 22 Feb 2004 00:26:58 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.36:4001,192.168.64.36,4001, alive=112540488] 22 Feb 2004 00:26:58 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.36: 4001,192.168.64.36,4001, alive=112548138] 22 Feb 2004 00:27:04 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.41:4001,192.168.64.41,4001, alive=113937290] 22 Feb 2004 00:27:04 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.41: 4001,192.168.64.41,4001, alive=113967890] 22 Feb 2004 00:27:09 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.36:4001,192.168.64.36,4001, alive=112548138] 22 Feb 2004 00:27:09 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.36: 4001,192.168.64.36,4001, alive=112558338] 22 Feb 2004 00:27:19 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.41:4001,192.168.64.41,4001, alive=113967890] 22 Feb 2004 00:27:19 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.41: 4001,192.168.64.41,4001, alive=113981150] 22 Feb 2004 00:27:27 ERROR TP-Processor16 - An exception or error occurred in the container during the request processing java.lang.OutOfMemoryError 22 Feb 2004 00:27:27 DEBUG Finalizer - result finalized 22 Feb 2004 00:27:27 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.36:4001,192.168.64.36,4001, alive=112558338] 22 Feb 2004 00:27:27 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.36: 4001,192.168.64.36,4001, alive=112573638] 22 Feb 2004 00:27:27 INFO TP-Processor16 - Unknown message 0 22 Feb 2004 00:27:34 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.36:4001,192.168.64.36,4001, alive=112573638] 22 Feb 2004 00:27:34 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.36: 4001,192.168.64.36,4001, alive=112581288] Box #2 (IP: 192.168.64.36): 22 Feb 2004 00:26:43 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117485053] 22 Feb 2004 00:26:48 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117485053] 22 Feb 2004 00:26:53 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117495344] 22 Feb 2004 00:26:56 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117495344] 22 Feb 2004 00:26:58 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117500276] 22 Feb 2004 00:27:01 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117500276] 22 Feb 2004 00:27:03 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117505583] 22 Feb 2004 00:27:06 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117505583] 22 Feb 2004 00:27:08 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117510798] 22 Feb 2004 00:27:14 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117510798] 22 Feb 2004 00:27:19 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117520986] 22 Feb 2004 00:27:22 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117520986] 22 Feb 2004 00:27:26 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117528626] 22 Feb 2004 00:27:29 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117528626] 22 Feb 2004 00:27:30 INFO TP-Processor1 - Unknown message 0 22 Feb 2004 00:27:34 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117536379] Box #3: 22 Feb 2004 00:26:40 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117477359] 22 Feb 2004 00:26:42 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117485053] 22 Feb 2004 00:26:45 WARN ContainerBackgroundProcessor[StandardEngine[Catalina]] - Wasn't able to read acknowledgement from server in 15000 ms. Disconnecting socket, and trying again. 22 Feb 2004 00:26:45 WARN ContainerBackgroundProcessor[StandardEngine[Catalina]] - Unable to send replicated message, is server down? java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) ... 22 Feb 2004 00:26:48 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117485053] 22 Feb 2004 00:26:52 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117495344] 22 Feb 2004 00:26:55 WARN TP-Processor1 - Wasn't able to read acknowledgement from server in 15000 ms. Disconnecting socket, and trying again. 22 Feb 2004 00:26:55 WARN TP-Processor1 - Unable to send replicated message, is server down? java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.net.SocketInputStream.read(SocketInputStream.java:182) at org.apache.catalina.cluster.tcp.SocketSender.waitForAck(SocketSender.jav a:181) at org.apache.catalina.cluster.tcp.SocketSender.sendMessage(SocketSender.ja va:172) at org.apache.catalina.cluster.tcp.PooledSocketSender.sendMessage(PooledSoc ketSender.java:166) ... 22 Feb 2004 00:26:55 WARN TP-Processor3 - Wasn't able to read acknowledgement from server in 15000 ms. Disconnecting socket, and trying again. 22 Feb 2004 00:26:55 WARN TP-Processor20 - Wasn't able to read acknowledgement from server in 15000 ms. Disconnecting socket, and trying again. 22 Feb 2004 00:26:55 WARN TP-Processor16 - Wasn't able to read acknowledgement from server in 15000 ms. Disconnecting socket, and trying again. 22 Feb 2004 00:26:55 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117495344] 22 Feb 2004 00:26:57 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117500276] 22 Feb 2004 00:27:00 INFO Cluster-MembershipReceiver - Received member disappeared:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168. 64.40:4001,192.168.64.40,4001, alive=117500276] 22 Feb 2004 00:27:02 INFO Cluster-MembershipReceiver - Replication member added:org.apache.catalina.cluster.mcast.McastMember[tcp://192.168.64.40: 4001,192.168.64.40,4001, alive=117505583] 22 Feb 2004 05:26:07 WARN TP-Processor138 - No socket sender available for client=/192.168.64.40:4001 did it disappear? 22 Feb 2004 05:26:07 WARN TP-Processor157 - No socket sender available for client=/192.168.64.40:4001 did it disappear? 22 Feb 2004 05:26:07 INFO TP-Processor8 - Unknown message 0 22 Feb 2004 05:26:07 WARN TP-Processor128 - No socket sender available for client=/192.168.64.40:4001 did it disappear? 22 Feb 2004 05:26:07 INFO TP-Processor32 - Unknown message 0 22 Feb 2004 05:26:07 WARN ContainerBackgroundProcessor[StandardEngine[Catalina]] - Unable to send replicated message, is server down? java.lang.IllegalStateException: Socket pool is closed. at org.apache.catalina.cluster.tcp.PooledSocketSender$SenderQueue.getSender (PooledSocketSender.java:217) at org.apache.catalina.cluster.tcp.PooledSocketSender.sendMessage(PooledSoc ketSender.java:160) at org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageData(R eplicationTransmitter.java:164) at org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessage(Repli cationTransmitter.java:196) at org.apache.catalina.cluster.tcp.SimpleTcpCluster.send(SimpleTcpCluster.j ava:450) ... Thanks, Vlad **************************************************************************** **** The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by Informa for any loss or damage arising in any way from receipt or use thereof. Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices. If you have received this message in error, please notify us by return and delete the message and any attachments. Further enquiries/returns can be sent to [EMAIL PROTECTED] --- Incoming mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.585 / Virus Database: 370 - Release Date: 2/11/2004 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.585 / Virus Database: 370 - Release Date: 2/11/2004 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]