https://issues.apache.org/bugzilla/show_bug.cgi?id=52529

             Bug #: 52529
           Summary: Tomcat stops working after NullPointerException
           Product: Tomcat 7
           Version: 7.0.14
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P2
         Component: Cluster
        AssignedTo: dev@tomcat.apache.org
        ReportedBy: dan...@orbisfn.com
    Classification: Unclassified


I have a cluster of 2 Tomcats, both are 7.0.14; the config for cluster is as
follows:


        <!-- Cluster configuration -->
        <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"
channelSendOptions="8">

            <Manager className="org.apache.catalina.ha.session.DeltaManager"
expireSessionsOnShutdown="false" notifyListenersOnReplication="true"/>
            <Channel className="org.apache.catalina.tribes.group.GroupChannel">
                <Membership
className="org.apache.catalina.tribes.membership.McastService"
                    address="228.0.0.4" port="50000" ttl="1"
                    frequency="500" dropTime="3000"
                />

                <Receiver
className="org.apache.catalina.tribes.transport.nio.NioReceiver"
                    address="192.168.110.96" port="8000" autoBind="100"
selectorTimeout="5000" maxThreads="6"
                />

                <Sender
className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
                    <Transport
className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"/>
                </Sender>

                <Interceptor
className="org.apache.catalina.tribes.group.interceptors.TcpFailureDetector"/>
                <Interceptor
className="org.apache.catalina.tribes.group.interceptors.MessageDispatch15Interceptor"/>
            </Channel>

            <Valve className="org.apache.catalina.ha.tcp.ReplicationValve"
filter=""/>
            <Valve
className="org.apache.catalina.ha.session.JvmRouteBinderValve"/>
            <ClusterListener
className="org.apache.catalina.ha.session.JvmRouteSessionIDBinderListener"/>
            <ClusterListener
className="org.apache.catalina.ha.session.ClusterSessionListener"/>
        </Cluster>

It has started running this week. Thus far, every single day both Tomcats go
offline and stop responding at around the same time due to the following error:

Tomcat 1 - 2012-01-25 14:58:09,246 [Tribes-Task-Receiver-3] ERROR
org.apache.catalina.ha.session.DeltaManager- Manager [domain#]: Unable to
receive message through TCP channel
java.lang.NullPointerException
Tomcat 2 - 2012-01-25 15:00:24,427 [Tribes-Task-Receiver-5] ERROR
org.apache.catalina.ha.session.DeltaManager- Manager [other domain#]: Unable to
receive message through TCP channel
java.lang.NullPointerException

This is followed by the following until both are stopped and restarted one at a
time:
2012-01-25 15:05:22,528 [Membership-MemberExpired.] INFO 
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector- Received
memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{192,
168, 110, 69}:8000,{192, 168, 110, 69},8000, alive=77417309, securePort=-1, UDP
Port=-1, id={110 7 26 17 -17 -10 79 10 -107 -51 -64 70 -73 -50 116 -102 },
payload={}, command={}, domain={}, ]] message. Will verify.
2012-01-25 15:05:22,528 [Membership-MemberExpired.] INFO 
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector- Verification
complete. Member still
alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{192, 168, 110,
69}:8000,{192, 168, 110, 69},8000, alive=77417309, securePort=-1, UDP Port=-1,
id={110 7 26 17 -17 -10 79 10 -107 -51 -64 70 -73 -50 116 -102 }, payload={},
command={}, domain={}, ]]
2012-01-25 15:07:23,278 [Membership-MemberExpired.] INFO 
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector- Received
memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{192,
168, 110, 69}:8000,{192, 168, 110, 69},8000, alive=77538070, securePort=-1, UDP
Port=-1, id={110 7 26 17 -17 -10 79 10 -107 -51 -64 70 -73 -50 116 -102 },
payload={}, command={}, domain={}, ]] message. Will verify.
2012-01-25 15:07:23,279 [Membership-MemberExpired.] INFO 
org.apache.catalina.tribes.group.interceptors.TcpFailureDetector- Verification
complete. Member still
alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{192, 168, 110,
69}:8000,{192, 168, 110, 69},8000, alive=77538070, securePort=-1, UDP Port=-1,
id={110 7 26 17 -17 -10 79 10 -107 -51 -64 70 -73 -50 116 -102 }, payload={},
command={}, domain={}, ]]

This is becoming a hurdle for our platform and I'm not sure where the problem
happens as there's no stack trace in the causing exception. Is it possible to
modify Tomcat's cluster logic to be highly fault tolerant? It seems that taking
down whole Tomcat because something had happened during session sync to be a
bad and quite dangerous logic.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to