Ok, I have some information to add here. I was able to get thread dumps with "jstack <pid>" (very handy!) because "kill -3" wouldn't work on a hung process. I've listed 3 consecutive tests that I run and the thread is blocked at the time the process is hung.
Test 1 Thread 21348: (state = BLOCKED) - java.lang.StringCoding$CharsetSE.encode(char[], int, int) @bci=15, line=334 (Compiled frame) - java.lang.StringCoding.encode(java.lang.String, char[], int, int) @bci=123, line=378 (Compiled frame) - java.lang.String.getBytes(java.lang.String) @bci=25, line=812 (Compiled frame) - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 (Compiled frame) - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 (Compiled frame) - java.io.File.exists() @bci=20, line=702 (Compiled frame) This is the main thread and it is hung on startup during the Spring initialization process. Test 2 Thread 24193: (state = BLOCKED) - java.lang.String.<init>(char[], int, int) @bci=50, line=208 (Compiled frame) - java.io.DataInputStream.readUTF() @bci=1, line=522 (Compiled frame) - org.aspectj.apache.bcel.classfile.ClassParser.readConstantPool() @bci=9, line=186 (Interpreted frame) - org.aspectj.apache.bcel.classfile.ClassParser.parse() @bci=9, line=131 (Interpreted frame) - org.aspectj.apache.bcel.util.NonCachingClassLoaderRepository.loadClass(java.lang.String) @bci=100, line=226 (Compiled frame) - org.aspectj.apache.bcel.util.NonCachingClassLoaderRepository.loadClass(java.lang.Class) @bci=5, line=237 (Interpreted frame) - org.aspectj.weaver.reflect.Java15AnnotationFinder.getParameterAnnotationTypes(java.lang.reflect.Member) @bci=21, line=353 (Compiled frame) Once again hung on an I/O operation this time in String's constructor Test 3 Thread 24365: (state = BLOCKED) - java.lang.String.<init>(char[], int, int) @bci=50, line=208 (Compiled frame) - net.sf.cglib.proxy.Enhancer.access$300(net.sf.cglib.proxy.Enhancer, net.sf.cglib.core.CodeEmitter, int) @bci=3, line=60 (Interpreted frame) - net.sf.cglib.proxy.Enhancer$6.emitCallback(net.sf.cglib.core.CodeEmitter, int) @bci=6, line=913 (Interpreted frame) - net.sf.cglib.proxy.InvocationHandlerGenerator.generate(net.sf.cglib.core.ClassEmitter, net.sf.cglib.proxy.CallbackGenerator$Context, java.util.List) @bci=85, line=44 (Compiled frame) - net.sf.cglib.proxy.Enhancer.emitMethods(net.sf.cglib.core.ClassEmitter, java.util.List, java.util.List) @bci=415, line=942 (Interpreted frame) - net.sf.cglib.proxy.Enhancer.generateClass(net.sf.cglib.asm.ClassVisitor) @bci=353, line=498 (Interpreted frame) - net.sf.cglib.core.DefaultGeneratorStrategy.generate(net.sf.cglib.core.ClassGenerator) @bci=11, line=25 (Interpreted frame) - net.sf.cglib.core.AbstractClassGenerator.create(java.lang.Object) @bci=182, line=216 (Interpreted frame) - net.sf.cglib.proxy.Enhancer.createHelper() @bci=105, line=377 (Interpreted frame) - net.sf.cglib.proxy.Enhancer.createClass() @bci=6, line=317 (Interpreted frame) with the last line in the Catalina log reading: 2009-08-31 14:35:46,261 INFO [SettingsFactory] : Named query checking : enabled 2009-08-31 14:35:46,284 INFO [SessionFactoryImpl] : building session factory In all examples, the software appears to be hung on fairly innocuous methods in the java framework itself. Test 1 and 2 both originate from calls in the IO package and the third test from the CGlib framework that we used from Hibernate. In all instances, there is a call to a method/constructor on the String object in the stack. I'm about to try Java 6 to see if it makes a difference. Thanks, Bradley On Sun, Aug 30, 2009 at 2:56 AM, Peter Crowther <peter.crowt...@melandra.com > wrote: > 2009/8/28 Bradley Wagner <bradley.wag...@hannonhill.com> > > > I have a Spring/Hibernate app running in Tomcat 5.5.20 that we've tested > in > > many environments that is currently faltering when running in a Ubuntu 7 > > VM. > > Basically on startup, frequently, the startup process will halt when > trying > > to read Hibernate's HBM files and in various other places in startup > > process. When it stops both CPUs are pegged at 200% usage by the java > > process (from 'top'). At this point, the only recourse is to kill the app > > with "kill -9 <pid>". Occasionally the app will start all the way up. > Then, > > I can trigger a re-index of the content in the app's database using > Lucene > > and the app will freeze again. > > > > The only thing I've been able to find in common about these operations is > > that they seem to be heavy I/O. > > > One trick would be to take a thread dump when the app locks up. kill -3 > <pid> will trigger such a dump. The output will be in one of Tomcat's > logfiles (catalina.out by default, I think). That might allow you to get > debugging information, as you're getting the data without having to enable > JPDA. > > Sometimes it's useful to take several thread dumps, a few seconds apart, > and > analyse them. However, given your suspicions of a race condition, I'm not > sure that's appropriate in this case as perturbing the system during > startup > might prevent the race condition. > > A few notable environment variables > > - JAVA_OPTS="-Xmx512M -XX:MaxPermSize=128m -Djava.awt.headless=true > > -Dfile.encoding=UTF-8" > > - running Sun's Java 1.5.0.16 > > - running Ubuntu 4.2.3-2ubuntu7 (from "cat /proc/version") > > > Thanks! Far too few posters provide this information without prompting. > > Are you stuck on Java 1.5? 1.6 is generally faster, though you may prefer > to stay with a version that you've tested with your app. > > > > Strangely, I tried to debug the application by enabling JPDA: > > JAVA_OPTS="-Xmx512M -XX:MaxPermSize=128m -Xdebug > > -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=n > > -Djava.awt.headless=true -Dfile.encoding=UTF-8" > > and the application starts up flawlessly. In fact, when running with JPDA > > enabled, it's impossible to get it to freeze even when doing the content > > re-index. > > > Chuck would know better, but I wonder whether enabling JPDA disables some > optimisations in the JVM and/or the compiler. > > Any debugging next steps or ideas are appreciated. > > > > Should I look at: > > - different ubuntu versions > > - different Java versions > > - different Tomcat versions > > > My first step would be to see whether a thread dump gives you any useful > information. I'd suggest at least going to the latest 5.5 release anyway > (5.5.28), as there have been some security fixes since 5.5.20 came out. > You > might wish to try the latest major version (6.0.20) and see how well it > behaves; it depends how much testing you want/need to do! > > > > Is the fact that it works with JPDA indicative of some kind of race > > condition possibly? > > > > See above - I suspect JPDA changes several things inside the JVM, but I'm > far from being the Java guru on this list. > > - Peter > -- Hannon Hill - CMS Experience You Can Trust (678) 904-6900 ext 115 http://www.hannonhill.com