Its linux distro. Linux version 2.6.32-358.14.1.el6.x86_64 ( mockbu...@x86-022.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Mon Jun 17 15:54:20 EDT 2013
Java version - 1.6 update 45. I doubt change in security group suddenly applied for the port. Am able to telnet from server which is shutdown to the currently running server to port 4444 . Yes. OS restart was done for a hardware upgrade for RAM and disk volume. On Tue, Aug 12, 2014 at 6:58 AM, Igor Cicimov <icici...@gmail.com> wrote: > On 12/08/2014 4:24 PM, "Krishna Saranathan" <krishna.saran...@gmail.com> > wrote: > > > > We have J2EE war application deployed in a cluster setup having two > > nodes. Tomcat 6.0.39 is installed in the both nodes having identical > > war deployed in both. Its deployed in Amazon AWS environment, and the > > What distro? Win or linux? And if linux which one? > > > two ec2-nodes are beneath an ELB , with session stickiness enabled for > > JSESSIONID. Also the two tomcat nodes are session replication enabled > > too. > > > > Following is Cluster config updated server.xml file: > > > > ============================================================================= > > <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" > > channelSendOptions="6" channelStartOptions="3"> > > > > <Manager className="org.apache.catalina.ha.session.DeltaManager" > > expireSessionsOnShutdown="false" notifyListenersOnReplication="true" > > /> > > > > <Channel className="org.apache.catalina.tribes.group.GroupChannel"> > > > > <Receiver > className="org.apache.catalina.tribes.transport.nio.NioReceiver" > > autoBind="0" selectorTimeout="5000" > > maxThreads="6" > > address="x.x.x.x" port="4444" /> > > <Sender > className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> > > <Transport > className="org.apache.catalina.tribes.transport.nio.PooledParallelSender" > > timeout="60000" > > keepAliveTime="10" > > keepAliveCount="0" > > /> > > </Sender> > > <Interceptor > > className="org.apache.catalina.tribes.group.interceptors.TcpPingInterceptor" > > staticOnly="true"/> > > <Interceptor > > className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> > > <Interceptor > > className="org.apache.catalina.tribes.group.interceptors.StaticMembershipInterceptor"> > > <Member className="org.apache.catalina.tribes.membership.StaticMember" > > host="x.x.x.x" > > port="4444" > > > > uniqueId="{0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4}"/> > > </Interceptor> > > </Channel> > > <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter="" > /> > > <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve" /> > > <ClusterListener > > > > className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/> > > <ClusterListener > > className="org.apache.catalina.ha.session.ClusterSessionListener"/> > > </Cluster> > > > > > ========================================================================== > > > > Receiver ip, static member ip and unique id is different in the > > server.xml of the other node in the cluster. > > > > this was running fine in production environment for 3 months. Suddenly > there was > > an exception logged like this :, and started coming up infinitely. > > > > > > ================================================== > > Aug 6, 2014 12:00:39 AM > > org.apache.catalina.tribes.group.interceptors.TcpFailureDetector > > memberDisappeared > > INFO: Received > memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp:// > 10.160.40.12:4444,10.160.40.12,4444, > > alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 }, payload={}, command={}, > > domain={}, ]] message. Will verify. > > Aug 6, 2014 12:00:39 AM > > org.apache.catalina.tribes.group.interceptors.TcpFailureDetector > > memberDisappeared > > INFO: Verification complete. Member still > > alive[org.apache.catalina.tribes.membership.MemberImpl[tcp:// > 10.160.40.12:4444,10.160.40.12,4444, > > alive=0,id={0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 }, payload={}, command={}, > > domain={}, ]] > > Aug 6, 2014 12:00:39 AM org.apache.catalina.ha.tcp.SimpleTcpCluster send > > SEVERE: Unable to send message through cluster sender. > > org.apache.catalina.tribes.ChannelException: Operation has timed > > out(60000 ms.).; Faulty members:tcp://10.160.40.12:4444; > > at > > org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(ParallelNioSender.java:97) > > at > > org.apache.catalina.tribes.transport.nio.PooledParallelSender.sendMessage(PooledParallelSender.java:53) > > at > > org.apache.catalina.tribes.transport.ReplicationTransmitter.sendMessage(ReplicationTransmitter.java:80) > > at > > org.apache.catalina.tribes.group.ChannelCoordinator.sendMessage(ChannelCoordinator.java:76) > > at > > org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) > > at > > org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) > > at > > org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.sendMessage(TcpFailureDetector.java:88) > > at > > org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) > > at > > org.apache.catalina.tribes.group.ChannelInterceptorBase.sendMessage(ChannelInterceptorBase.java:75) > > at > org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:216) > > at > org.apache.catalina.tribes.group.GroupChannel.send(GroupChannel.java:175) > > at > org.apache.catalina.ha.tcp.SimpleTcpCluster.send(SimpleTcpCluster.java:817) > > at > > org.apache.catalina.ha.tcp.SimpleTcpCluster.sendClusterDomain(SimpleTcpCluster.java:791) > > at > org.apache.catalina.ha.tcp.ReplicationValve.send(ReplicationValve.java:553) > > at > > org.apache.catalina.ha.tcp.ReplicationValve.sendMessage(ReplicationValve.java:537) > > at > > org.apache.catalina.ha.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:519) > > at > > org.apache.catalina.ha.tcp.ReplicationValve.sendReplicationMessage(ReplicationValve.java:430) > > at > > org.apache.catalina.ha.tcp.ReplicationValve.invoke(ReplicationValve.java:363) > > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) > > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) > > at > > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) > > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) > > at java.lang.Thread.run(Thread.java:662) > > > > ============================================================================ > > > > > > After this, the web application is not accessible, and we have to > > manually kill the tomcat process in one node, thereby disabling the > > cluster. > > > > > > We are unsure, how all of a sudden this is coming, and disabling > > application access altogether. If there are any suggestion on remedy, > > pls provide the same. > > Firewall??? > Did you change something in the SecurityGroup the instances belong to that > might have affected the port 4444? Can you telnet from the server you shut > down tomcat to port 4444 on the server tomcat is running on? Did you do a > restart or OS update/upgrade that might have pulled some firewall package > and activated it afterwards? >