https://issues.apache.org/bugzilla/show_bug.cgi?id=52529
Bug #: 52529 Summary: Tomcat stops working after NullPointerException Product: Tomcat 7 Version: 7.0.14 Platform: PC OS/Version: Linux Status: NEW Severity: critical Priority: P2 Component: Cluster AssignedTo: dev@tomcat.apache.org ReportedBy: dan...@orbisfn.com Classification: Unclassified I have a cluster of 2 Tomcats, both are 7.0.14; the config for cluster is as follows: <!-- Cluster configuration --> <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster" channelSendOptions="8"> <Manager className="org.apache.catalina.ha.session.DeltaManager" expireSessionsOnShutdown="false" notifyListenersOnReplication="true"/> <Channel className="org.apache.catalina.tribes.group.GroupChannel"> <Membership className="org.apache.catalina.tribes.membership.McastService" address="228.0.0.4" port="50000" ttl="1" frequency="500" dropTime="3000" /> <Receiver className="org.apache.catalina.tribes.transport.nio.NioReceiver" address="192.168.110.96" port="8000" autoBind="100" selectorTimeout="5000" maxThreads="6" /> <Sender className="org.apache.catalina.tribes.transport.ReplicationTransmitter"> <Transport className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/> </Sender> <Interceptor className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/> <Interceptor className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/> </Channel> <Valve className="org.apache.catalina.ha.tcp.ReplicationValve" filter=""/> <Valve className="org.apache.catalina.ha.session.JvmRouteBinderValve"/> <ClusterListener className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/> <ClusterListener className="org.apache.catalina.ha.session.ClusterSessionListener"/> </Cluster> It has started running this week. Thus far, every single day both Tomcats go offline and stop responding at around the same time due to the following error: Tomcat 1 - 2012-01-25 14:58:09,246 [Tribes-Task-Receiver-3] ERROR org.apache.catalina.ha.session.DeltaManager- Manager [domain#]: Unable to receive message through TCP channel java.lang.NullPointerException Tomcat 2 - 2012-01-25 15:00:24,427 [Tribes-Task-Receiver-5] ERROR org.apache.catalina.ha.session.DeltaManager- Manager [other domain#]: Unable to receive message through TCP channel java.lang.NullPointerException This is followed by the following until both are stopped and restarted one at a time: 2012-01-25 15:05:22,528 [Membership-MemberExpired.] INFO org.apache.catalina.tribes.group.interceptors.TcpFailureDetector- Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{192, 168, 110, 69}:8000,{192, 168, 110, 69},8000, alive=77417309, securePort=-1, UDP Port=-1, id={110 7 26 17 -17 -10 79 10 -107 -51 -64 70 -73 -50 116 -102 }, payload={}, command={}, domain={}, ]] message. Will verify. 2012-01-25 15:05:22,528 [Membership-MemberExpired.] INFO org.apache.catalina.tribes.group.interceptors.TcpFailureDetector- Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{192, 168, 110, 69}:8000,{192, 168, 110, 69},8000, alive=77417309, securePort=-1, UDP Port=-1, id={110 7 26 17 -17 -10 79 10 -107 -51 -64 70 -73 -50 116 -102 }, payload={}, command={}, domain={}, ]] 2012-01-25 15:07:23,278 [Membership-MemberExpired.] INFO org.apache.catalina.tribes.group.interceptors.TcpFailureDetector- Received memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{192, 168, 110, 69}:8000,{192, 168, 110, 69},8000, alive=77538070, securePort=-1, UDP Port=-1, id={110 7 26 17 -17 -10 79 10 -107 -51 -64 70 -73 -50 116 -102 }, payload={}, command={}, domain={}, ]] message. Will verify. 2012-01-25 15:07:23,279 [Membership-MemberExpired.] INFO org.apache.catalina.tribes.group.interceptors.TcpFailureDetector- Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{192, 168, 110, 69}:8000,{192, 168, 110, 69},8000, alive=77538070, securePort=-1, UDP Port=-1, id={110 7 26 17 -17 -10 79 10 -107 -51 -64 70 -73 -50 116 -102 }, payload={}, command={}, domain={}, ]] This is becoming a hurdle for our platform and I'm not sure where the problem happens as there's no stack trace in the causing exception. Is it possible to modify Tomcat's cluster logic to be highly fault tolerant? It seems that taking down whole Tomcat because something had happened during session sync to be a bad and quite dangerous logic. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org For additional commands, e-mail: dev-h...@tomcat.apache.org