On Apr 4, 2013, at 6:43 AM, Andy Pahne wrote: > > An application that has been running fine for years now suddenly does perform > with varying results, sometimes as quick as always, but then sometimes a > simple page request uses up to 30 seconds.
If you haven't changed anything with the application or your Tomcat configuration, then you'll want to look at the external resources that your application depends upon, such as a database, the network, shared file systems, etc… If the performance of an external resource is suffering, it could definitely be causing problems for your application. > > Since the performance did degrade we regularly find log items like the > following one in catalina.out (many of them, about 100 to 300 per hour on > each host): > > 04.04.2013 11:51:53 > org.apache.catalina.tribes.group.interceptors.TcpFailureDetector > memberDisappeared > INFO: Verification complete. Member still > alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64, -88, 6, > 21}:4000,{-64, -88, 6, 21},4000, alive=1706334,id={-99 120 -58 21 -84 121 74 > 45 -104 -73 -123 -40 10 -76 70 59 }, payload={}, command={}, domain={}, ]] I think that you'll typically see these when there is a network issue, but you would see them anytime a member is timed out. The connections between the nodes in your cluster are monitored with a heartbeat. When a node doesn't respond to the heartbeat the node is considered to have left the cluster. To protect against false positives you can configure a TcpFailureDetector. This listens for "memberDisappeared" events and when one occurs, it will connect to the member via TCP to try to confirm it's disappearance. In your case, the message that you are seeing is indicating that the heartbeat failed, but that the TcpFailureDetector was able to verify the node still exists. In other words, this is a false positive. In addition to the TcpFailureDetector, you can also adjust the "frequency" and "dropTime" attributes to control how often heartbeats are sent and how long to wait for the response. You might try adjusting these settings to make the configuration more tolerant of your network. https://tomcat.apache.org/tomcat-6.0-doc/config/cluster-membership.html > We ruled out that the recent changes to said application are the cause for > the poor performance y simulating all sorts of heavy load on various test > systems. It just works nicely in the test environment. However, on production > it does not. > > We are using the SimpleTcpCluster solution for clustering on Tomcat 6. The > cluster has two nodes. It would be helpful to post your configuration, minus comments, as well as the exact version of Tomcat that you are running. > > I am NOT suspecting a tomcat bug. And as I said I am not suspecting a > performance bottleneck in our application or in the db queries it performs. > At the moment I am thinking of a hardware failure of some kind (network > interface, router etc.). > > Do you have any experience with this problem and what did you do to resolve > it? If you suspect a network issue, you could try monitoring with Wireshark or tcpdump to capture the network packets. Analysis of the packets could show if there is a problem. Another option would be to try and use a tool like iperf to put a high load on your network and possibly trigger the problem. Dan > > Thanks, > Andy > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org