Hi Kirill, I am amazed at the size of the Java Heap of 26 GB. I didn't expect that big a size. Before I suggest anything else, please take a look at the filesystem space on "/tmp". The script creates a heapdump everytime it runs as well. So please make sure that "/tmp/" has sufficient storage. Otherwise it may bring your service down.
(1) Could you please paste your JVM command-line arguments here? (2) Also, did you have a chance to observe the JConsole graphs? I have attached a sample JConsole graph here. (3) What JDK version are you using? (4) GC for a Heap of that size may need some tuning. You may start JConsole as follows on Linux/Unix. a.Place the following script in your home-directory on the Tomcat app-server machine . b. Make sure that you have enabled Tomcat to allow JMX access. c. Then log-in from a remote laptop/desktop into the server (using ssh), and invoke this script from your home-directory: --------start_jconsole.sh------ DISPLAY_NUMBER=`echo $DISPLAY | /usr/bin/perl -n -e 'm#localhost:(\d+).0# && print $1'` echo $DISPLAY_NUMBER XAUTH_ENTRY=`/usr/bin/xauth list | grep "unix:$DISPLAY_NUMBER"` /usr/bin/jconsole -J-Djava.class.path=/usr/lib/jvm/java/lib/jconsole.jar:/usr/lib/jvm/java/lib/tools.jar ------------------------------- [image: Inline image 1] Thanks. -Shanti On Tue, Oct 2, 2012 at 7:43 PM, Kirill Kireyev <kir...@instagrok.com> wrote: > Hi Shanti! Thanks for your script! I ran it (with minor modifications), > and it turns out that the garbage collection threads are indeed taking up > the majority of CPU (see below). What would you recommend I do to > tune/troubleshoot GC further? My server has 32G or RAM, JVM has heap size > of 26G > > Thanks, > Kirill > > PID CPU Mem JStack Info > 0x253d 80 90.7 "GC task thread#3 (ParallelGC)" prio=10 > tid=0x0000000001938800 nid=0x253d runnable > 0x2540 76 90.7 "GC task thread#6 (ParallelGC)" prio=10 > tid=0x000000000193e000 nid=0x2540 runnable > 0x253b 74 90.7 "GC task thread#1 (ParallelGC)" prio=10 > tid=0x0000000001934800 nid=0x253b runnable > 0x253e 74 90.7 "GC task thread#4 (ParallelGC)" prio=10 > tid=0x000000000193a000 nid=0x253e runnable > 0x2541 74 90.7 "GC task thread#7 (ParallelGC)" prio=10 > tid=0x000000000193f800 nid=0x2541 runnable > 0x253c 66 90.7 "GC task thread#2 (ParallelGC)" prio=10 > tid=0x0000000001936800 nid=0x253c runnable > 0x253a 47 90.7 "GC task thread#0 (ParallelGC)" prio=10 > tid=0x0000000001933000 nid=0x253a runnable > 0x253f 39 90.7 "GC task thread#5 (ParallelGC)" prio=10 > tid=0x000000000193c000 nid=0x253f runnable > 0x2542 19 90.7 "VM Thread" prio=10 tid=0x00000000019b6000 > nid=0x2542 runnable > 0x2539 0 90.7 "main" prio=10 tid=0x0000000001925000 nid=0x2539 > runnable [0x00007fee9d444000] > 0x2543 0 90.7 "Reference Handler" daemon prio=10 > tid=0x00000000019bd800 nid=0x2543 runnable [0x00007fee500d8000] > 0x2544 0 90.7 "Finalizer" daemon prio=10 tid=0x00000000019c5800 > nid=0x2544 in Object.wait() [0x00007fee4ffd7000] > 0x2545 0 90.7 "Signal Dispatcher" daemon prio=10 > tid=0x0000000001a12000 nid=0x2545 runnable [0x0000000000000000] > 0x2546 0 90.7 "C2 CompilerThread0" daemon prio=10 > tid=0x00007fee48001000 nid=0x2546 waiting on condition [0x0000000000000000] > 0x2547 0 90.7 "C2 CompilerThread1" daemon prio=10 > tid=0x00007fee48004000 nid=0x2547 waiting on condition [0x0000000000000000] > 0x2548 0 90.7 "Service Thread" daemon prio=10 > tid=0x0000000001a14800 nid=0x2548 runnable [0x0000000000000000] > 0x2549 0 90.7 "VM Periodic Task Thread" prio=10 > tid=0x0000000001a1f800 nid=0x2549 waiting on condition > 0x254a 0 90.7 "GC Daemon" daemon prio=10 tid=0x00007fee404b3000 > nid=0x254a in Object.wait() [0x00007fee4e1bd000] > 0x254d 0 90.7 > "ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10 > tid=0x00007fee40389800 nid=0x254d runnable [0x00007fee4d820000] > 0x254e 0 90.7 "http-apr-8080-Poller-0" daemon prio=10 > tid=0x00007fee40378000 nid=0x254e in Object.wait() [0x00007fee4d661000] > 0x254f 0 90.7 "http-apr-8080-CometPoller-0" daemon prio=10 > tid=0x00007fee40395000 nid=0x254f in Object.wait() [0x00007fee4d4a2000] > 0x2550 0 90.7 "http-apr-8080-Sendfile-0" daemon prio=10 > tid=0x00007fee40397000 nid=0x2550 in Object.wait() [0x00007fee4d2b0000] > 0x2551 0 90.7 "http-apr-8080-Acceptor-0" daemon prio=10 > tid=0x00007fee4072c000 nid=0x2551 runnable [0x00007fee4d1af000] > 0x2552 0 90.7 "http-apr-8080-AsyncTimeout" daemon prio=10 > tid=0x00007fee4072e000 nid=0x2552 sleeping[0x00007fee4d0ac000] > 0x2553 0 90.7 "http-apr-8443-Poller-0" daemon prio=10 > tid=0x00007fee40730000 nid=0x2553 in Object.wait() [0x00007fee4cfab000] > 0x2554 0 90.7 "http-apr-8443-CometPoller-0" daemon prio=10 > tid=0x00007fee40732000 nid=0x2554 in Object.wait() [0x00007fee4cdee000] > 0x2555 0 90.7 "http-apr-8443-Acceptor-0" daemon prio=10 > tid=0x00007fee40734800 nid=0x2555 runnable [0x00007fee4cced000] > 0x2556 0 90.7 "http-apr-8443-AsyncTimeout" daemon prio=10 > tid=0x00007fee4073b800 nid=0x2556 sleeping[0x00007fee4cbec000] > 0x2557 0 90.7 "ajp-apr-8009-Poller-0" daemon prio=10 > tid=0x00007fee4073d800 nid=0x2557 runnable [0x00007fee4ca2f000] > 0x2558 0 90.7 "ajp-apr-8009-CometPoller-0" daemon prio=10 > tid=0x00007fee4073f800 nid=0x2558 in Object.wait() [0x00007fee4c872000] > 0x2559 0 90.7 "ajp-apr-8009-Acceptor-0" daemon prio=10 > tid=0x00007fee40741800 nid=0x2559 runnable [0x00007fee4c771000] > 0x255a 0 90.7 "ajp-apr-8009-AsyncTimeout" daemon prio=10 > tid=0x00007fee40743800 nid=0x255a sleeping[0x00007fee4c670000] > 0x255f 0 90.7 "ajp-apr-8009-exec-5" daemon prio=10 > tid=0x0000000001cef000 nid=0x255f runnable [0x00007fee4c157000] > 0x2560 0 90.7 "ajp-apr-8009-exec-6" daemon prio=10 > tid=0x0000000001e93800 nid=0x2560 runnable [0x00007fee47ffd000] > 0x2561 0 90.7 "ajp-apr-8009-exec-7" daemon prio=10 > tid=0x0000000001c1e800 nid=0x2561 runnable [0x00007fee4db39000] > 0x2562 0 90.7 "ajp-apr-8009-exec-8" daemon prio=10 > tid=0x0000000001d41000 nid=0x2562 runnable [0x00007fee4dc3a000] > 0x2563 0 90.7 "ajp-apr-8009-exec-9" daemon prio=10 > tid=0x0000000001bf9800 nid=0x2563 runnable [0x00007fee47efb000] > 0x2564 0 90.7 "ajp-apr-8009-exec-10" daemon prio=10 > tid=0x0000000001cfe800 nid=0x2564 waiting on condition [0x00007fee47dfa000] > 0x2586 0 90.7 "ajp-apr-8009-exec-12" daemon prio=10 > tid=0x0000000001e85000 nid=0x2586 waiting on condition [0x00007fee47bf6000] > 0x2587 0 90.7 "pool-1-thread-1" prio=10 tid=0x0000000001fe4000 > nid=0x2587 waiting on condition [0x00007fee47af3000] > 0x2635 0 90.7 "ajp-apr-8009-exec-15" daemon prio=10 > tid=0x000000000232d800 nid=0x2635 runnable [0x00007fee38991000] > 0x2637 0 90.7 "ajp-apr-8009-exec-17" daemon prio=10 > tid=0x0000000002392800 nid=0x2637 waiting on condition [0x00007fee38790000] > 0x2638 0 90.7 "ajp-apr-8009-exec-18" daemon prio=10 > tid=0x0000000002395000 nid=0x2638 waiting on condition [0x00007fee3868f000] > 0x263a 0 90.7 "ajp-apr-8009-exec-20" daemon prio=10 > tid=0x0000000002312800 nid=0x263a waiting on condition [0x00007fee3848b000] > 0x263d 0 90.7 "ajp-apr-8009-exec-23" daemon prio=10 > tid=0x0000000002324800 nid=0x263d runnable [0x00007fee38187000] > 0x50b 0 90.7 "http-apr-8443-exec-1" daemon prio=10 > tid=0x0000000002209000 nid=0x50b waiting on condition [0x00007fee232b1000] > 0x50c 0 90.7 "http-apr-8443-exec-2" daemon prio=10 > tid=0x000000000213c000 nid=0x50c waiting on condition [0x00007fee233b2000] > 0x50d 0 90.7 "http-apr-8443-exec-3" daemon prio=10 > tid=0x000000000239e000 nid=0x50d waiting on condition [0x00007fee22baa000] > 0x50e 0 90.7 "http-apr-8443-exec-4" daemon prio=10 > tid=0x0000000002aeb800 nid=0x50e waiting on condition [0x00007fee3939d000] > 0x50f 0 90.7 "http-apr-8443-exec-5" daemon prio=10 > tid=0x0000000002ad3000 nid=0x50f waiting on condition [0x00007fee38f99000] > 0x510 0 90.7 "http-apr-8443-exec-6" daemon prio=10 > tid=0x00000000025a1800 nid=0x510 waiting on condition [0x00007fee397a1000] > 0x511 0 90.7 "http-apr-8443-exec-7" daemon prio=10 > tid=0x0000000002188800 nid=0x511 waiting on condition [0x00007fee45db8000] > 0x512 0 90.7 "http-apr-8443-exec-8" daemon prio=10 > tid=0x00000000023b3800 nid=0x512 waiting on condition [0x00007fee44aa5000] > 0x513 0 90.7 "http-apr-8443-exec-9" daemon prio=10 > tid=0x0000000002164800 nid=0x513 waiting on condition [0x00007fee3dae4000] > 0x514 0 90.7 "http-apr-8443-exec-10" daemon prio=10 > tid=0x0000000002af6800 nid=0x514 waiting on condition [0x00007fee279f8000] > 0x19a7 0 90.7 "http-apr-8080-exec-1" daemon prio=10 > tid=0x0000000001ac2800 nid=0x19a7 waiting on condition [0x00007fee3747b000] > 0x19a8 0 90.7 "http-apr-8080-exec-2" daemon prio=10 > tid=0x000000000257d000 nid=0x19a8 waiting on condition [0x00007fee3767d000] > 0x28b0 0 90.7 "http-apr-8080-exec-3" daemon prio=10 > tid=0x000000000274c000 nid=0x28b0 waiting on condition [0x00007fee3f8f7000] > 0x28b1 0 90.7 "http-apr-8080-exec-4" daemon prio=10 > tid=0x00000000021c8800 nid=0x28b1 waiting on condition [0x00007fee3adb7000] > 0x46e7 0 90.7 "Keep-Alive-Timer" daemon prio=10 > tid=0x0000000002160800 nid=0x46e7 runnable [0x00007fee3d5df000] > 0x29c5 0 90.7 "Keep-Alive-SocketCleaner" daemon prio=10 > tid=0x000000000242b000 nid=0x29c5 in Object.wait() [0x00007fee1f877000] > 0x6ef8 0 90.7 "http-apr-8080-exec-5" daemon prio=10 > tid=0x0000000002a56000 nid=0x6ef8 waiting on condition [0x00007fee454af000] > 0x6ef9 0 90.7 "http-apr-8080-exec-6" daemon prio=10 > tid=0x00000000022de800 nid=0x6ef9 waiting on condition [0x00007fee45ab5000] > 0x3d38 0 90.7 "http-apr-8080-exec-7" daemon prio=10 > tid=0x0000000002233000 nid=0x3d38 waiting on condition [0x00007fee37980000] > 0x3d39 0 90.7 "http-apr-8080-exec-8" daemon prio=10 > tid=0x00000000021f7800 nid=0x3d39 waiting on condition [0x00007fee38087000] > 0x64a3 0 90.7 "ajp-apr-8009-exec-36" daemon prio=10 > tid=0x00000000022dd000 nid=0x64a3 waiting on condition [0x00007fee35158000] > 0x64a6 0 90.7 "ajp-apr-8009-exec-39" daemon prio=10 > tid=0x0000000002add800 nid=0x64a6 runnable [0x00007fee3bdc7000] > 0x64a7 0 90.7 "ajp-apr-8009-exec-40" daemon prio=10 > tid=0x00000000029c0800 nid=0x64a7 waiting on condition [0x00007fee24fce000] > 0x64ab 0 90.7 "ajp-apr-8009-exec-44" daemon prio=10 > tid=0x0000000002924000 nid=0x64ab runnable [0x00007fee20f8d000] > 0x64ac 0 90.7 "ajp-apr-8009-exec-45" daemon prio=10 > tid=0x0000000002555800 nid=0x64ac waiting on condition [0x00007fee256d5000] > 0x64af 0 90.7 "ajp-apr-8009-exec-48" daemon prio=10 > tid=0x0000000001f8a800 nid=0x64af waiting on condition [0x00007fee24ccb000] > 0x64b0 0 90.7 "ajp-apr-8009-exec-49" daemon prio=10 > tid=0x000000000257b000 nid=0x64b0 runnable [0x00007fee36d73000] > 0x64b1 0 90.7 "ajp-apr-8009-exec-50" daemon prio=10 > tid=0x0000000001bb3000 nid=0x64b1 runnable [0x00007fee36b71000] > 0x64b2 0 90.7 "ajp-apr-8009-exec-51" daemon prio=10 > tid=0x000000000224a000 nid=0x64b2 runnable [0x00007fee270ee000] > 0x64b3 0 90.7 "ajp-apr-8009-exec-52" daemon prio=10 > tid=0x000000000216e800 nid=0x64b3 runnable [0x00007fee20c8a000] > 0xf1b 0 90.7 "http-apr-8080-exec-9" daemon prio=10 > tid=0x0000000001c9e800 nid=0xf1b waiting on condition [0x00007fee3929c000] > 0xf1c 0 90.7 "http-apr-8080-exec-10" daemon prio=10 > tid=0x0000000001e4c000 nid=0xf1c waiting on condition [0x00007fee3d8e2000] > 0x3eda 0 90.7 "ajp-apr-8009-exec-55" daemon prio=10 > tid=0x0000000002155000 nid=0x3eda runnable [0x00007fee27cfb000] > 0x3edb 0 90.7 "ajp-apr-8009-exec-56" daemon prio=10 > tid=0x000000000232f800 nid=0x3edb runnable [0x00007fee27bfa000] > 0x51c7 0 90.7 "ajp-apr-8009-exec-57" daemon prio=10 > tid=0x000000000211c800 nid=0x51c7 waiting on condition [0x00007fee37c83000] > > > > On 9/27/12 12:17 PM, Shanti Suresh wrote: > > Hi Kirill, > > Like Mark, Bill and Jeff said, those threads are normal request-processing > threads. I have included a script that might help with isolating high CPU > issues with Tomcat. > > Also, I think it might be helpful to see how the Java heap is performing as > well. > Please bring up Jconsole and let it run over the week. Inspect the graphs > for Memory, CPU and threads. Since you say that high CPU occurs > intermittently several times during the week and clears itself, I wonder if > it is somehow related with the garbage collection options you are using for > the server. Or it may be a code-related problem. > > Things to look at may include: > > (1) Are high CPU times related to Java heap reductions happening at the > same time? ==> GC possibly needs tuning > (2) Are high CPU times related to increase in thread usage? ==> possible > livelock in looping code? > (3) how many network connections come into the Tomcat server during > high-CPU times? Possible overload-related? > > Here is the script. I made a couple of small changes, for e.g., changing > the username. But didn't test it after the change. During high-CPU times, > invoke the script a few times, say 30 seconds apart. And then compare the > thread-dumps. I like to use TDA for thread-dump analysis of Tomcat > thread-dumps. > > Mark, et al, please feel free to help me refine this script. I would like > to have a script to catch STUCK threads too :-) Let me know if anyone has > a script already. Thanks. > > --------------high_cpu_diagnostics.pl:----- > #!/usr/bin/perl > # > > use Cwd; > > # Make a dated directory for capturing current diagnostics > my ($sec,$min,$hour,$mday,$mon,$year, > $wday,$yday,$isdst) = localtime time; > $year += 1900; > $mon += 1; > my $pwd = cwd(); > my $preview_diag_dir = "/tmp/Preview_Diag.$year-$mon-$mday-$hour:$min:$sec"; > print "$preview_diag_dir\n"; > mkdir $preview_diag_dir, 0755; > chdir($preview_diag_dir) or die "Can't chdir into $preview_diag_dir $!\n"; > > # Capture Preview thread dump > my $process_pattern = "preview"; > my $preview_pid = `/usr/bin/pgrep -f $process_pattern`; > my $login = getpwuid($<) ; > if (kill 0, $preview_pid){ > #Possible to send a signal to the Preview Tomcat - either "webinf" > or "root" > my $count = kill 3, $preview_pid; > }else { > # Not possible to send a signal to the VCM - use "sudo" > system ("/usr/bin/sudo /bin/kill -3 $preview_pid"); > } > > # Capture Preview thread dump > system ("/usr/bin/jmap > -dump:format=b,file=$preview_diag_dir/preview_heapdump.hprof $preview_pid"); > > # Gather the top threads; keep around for reference on what other threads > are running > @top_cmd = ("/usr/bin/top", "-H", "-n1", "-b"); > @sort_cmd = ("/bin/sort", "-r", "-n", "-k", "9,9"); > @sed_cmd = ("/bin/sed", "-n", "'8,$p'"); > system("@top_cmd 1> top_all_threads.log"); > > # Get your tomcat user's threads, i.e. threads of user, "webinf" > system('/usr/bin/tail -n+6 top_all_threads.log | /bin/sort -r -n -k "9,9" | > /bin/grep webinf top_all_threads.log 1> top_user_webinf_threads.log'); > > # Get the thread dump > my @output=`/usr/bin/jstack -l ${preview_pid}`; > open (my $file, '>', 'preview_threaddump.txt') or die "Could not open file: > $!"; > print $file @output; > close $file; > > open LOG, "top_user_webinf_threads.log" or die $!; > open (STDOUT, "| tee -ai top_cpu_preview_threads.log"); > print "PID\tCPU\tMem\tJStack Info\n"; > while ($l = <LOG>) { > chop $l; > $pid = $l; > $pid =~ s/webinf.*//g; > $pid =~ s/ *//g; > ## Hex PID is available in the Sun HotSpot Stack Trace */ > $hex_pid = sprintf("%#x", $pid); > @values = split(/\s+/, $l); > $pct = $values[8]; > $mem = $values[9]; > # Debugger breakpoint: > $DB::single = 1; > > # Find the Java thread that corresponds to the thread-id from the TOP output > for my $j (@output) { > chop $j; > ($j =~ m/nid=$hex_pid/) && print $hex_pid . "\t" . $pct . "\t" . > $mem . "\t" . $j . "\n"; > } > } > > close (STDOUT); > > close LOG; > > > ------------------end of script -------------------------- > > Thanks. > > -Shanti > > > On Thu, Sep 27, 2012 at 2:11 PM, Bill Miller > <millebi.subscripti...@gmail.com> wrote: > > > I agree; we have reproducible instances where PermGen is not set to our > requirements on the Tomcat startup parameters and it will cause a "lockup" > every time. Do some JMX monitoring and you may discover a memory spike > that's killing Tomcat. > > Bill > -----Original Message----- > From: Jeff MAURY [mailto:jeffma...@gmail.com <jeffma...@gmail.com>] > Sent: September-27-2012 2:01 PM > To: Tomcat Users List > Subject: Re: high CPU usage on tomcat 7 > > This is probably due to out of memory, I have the same problem on my ubuntu > ci machine Did you monitor your tomcat with jmx ? > > Jeff > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > > > > -- > *Kirill Kireyev, PhD* > Founder/CTO instaGrok.com <http://www.instagrok.com> > kir...@instagrok.com > Twitter: @instaGrok <http://twitter.com/InstaGrok> > FB: facebook.com/instagrok <http://www.facebook.com/instagrok> > <http://www.instagrok.com> >