Hi, Without knowing what your application is doing, here is something you should look at:
External dependencies. If your application is stable performance wise at steady state for a few thousands of GC cycles, and then jumps in resources out of nowhere, it is very possible that this is due to an external interruption. Possible interruptions: 1. Another process that runs on the machine of the tomcat server. Cron job, log file compression, you name it. Go over the process list and cron to start with. We run performance tests only on very clean machines, bare bones linux with java and tomcat and nothing else. 2. If your app is using an external application, like a relational database, web service, other HTTP server, email system, that system may be the periodic bottleneck. If it has a periodic interruption (like a job that runs on that machine, or just its own behavior under constant load), it may block your threads for a short amount of time, causing the thread count to increase. 3. Something that runs periodically inside your application. Same as above, but not external. This may also be a third party library that you use in your app without directly doing it yourself. 4. Get rid of all webapps that are not the web app that you test. There are a few that come with tomcat by default. Make sure your tomcat runs only the app you are testing. Conf/Catalina and all its subdirectories should contain one xml file at most if you need context files. All your web app directories together should contain one war file/directory combined. 5. If your system actually consists of a few war files, load test each one of them separately to find the trouble maker. When you test one, the others should not be installed on tomcat (IE, delete the war file and its sibling directory). Not installed and not running, NOT INSTALLED. Big difference. Try to do a thread dump while the jump is happening, if you can catch that. If you can get a thread dump while the number of the threads increasing (sounds like difficult timing from what you describe) you will see what the threads are doing at that point, and at least one of thread will be able to give you an insight why they cannot handle the current momentary load. That is quite a coincidence that as we speak I am dong exactly that, load testing tomcat on different EC2 combinations :) E