Thanks for the suggestions Chris. Unfortunately, the memory that was exhausted was the OldGen heap area, not PermGen, which doesn't show up in the Catalina log.
The heap allocation is quite hefty as this is a 64-bit environment...we need to get our developers to look into the application behaviour, but in the meantime I was looking for a way of dealing with the problem. -----Original Message----- From: Christopher Schultz [mailto:ch...@christopherschultz.net] Sent: 24 September 2009 19:59 To: Tomcat Users List Subject: Re: Clustering Question... -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Darren, On 9/24/2009 10:21 AM, Darren Kukulka wrote: > In a 2-node scenario, where both nodes are configured identically and > load balanced via Apache based on availability, how can we configure the > cluster to deal with situations where one node has exhausted its Old Gen > heap allocation. Hmm... this is a situations that is difficult to detect using remote code (like mod_jk). > In such situations we've observed that the application being served by > the cluster slow down considerably. I can understand why this would be > the case for sessions on the degraded node, but why would sessions on > the good node suffer? Are you using session replication? If so, the "good" Tomcat may be slowing down attempting to replicate session changes to the "damaged" Tomcat that is either not responding, or responding slowly, or responding in confusing ways. > How can we modify our configuration to deal with such occurrences more > effectively? After we had some trouble with OOMEs in production (legit ones, actually: we just needed more heap), I implemented a quick-and-dirty OOME checker. All it does is "grep OutOfMemoryError catalina.out" and, if found, sends an email to someone. Instead of emailing, you could have your OOME checker actually shut down (or forceably terminate) the damaged Tomcat, and then the cluster should stabilize. With only two nodes, this might be a problem, as the good Tomcat will take over and might, under the new load of 100% of your traffic, experience its own pergmen exhaustion and also be shut down. You should consider adjusting your pergmen allocation (duh!) as well as perhaps your heap allocation as well. - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkq7wXcACgkQ9CaO5/Lv0PDVxACfT6X4tsFOEZ0nRWpYOIfLr7lX XMIAoJUEs5uW3tTLqeRB5wCf1bo0oi4Q =4LWQ -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org Connaught plc is a FTSE 250 company. We are the UK's leading provider of integrated services operating in the compliance, social housing and public sector markets. Please visit our website to see a full list of Connaught's Registered Companies www.connaught.plc.uk/group/aboutconnaught/registeredcompanies Disclaimer: The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete this message. Connaught plc, Head Office 01392 444546