Hi Kirill,

I am amazed at the size of the Java Heap of 26 GB.  I didn't expect that
big a size.  Before I suggest anything else, please take a look at the
filesystem space on "/tmp".  The script creates a heapdump everytime it
runs as well.  So please make sure that "/tmp/" has sufficient storage.
Otherwise it may bring your service down.

(1) Could you please paste your JVM command-line arguments here?
(2) Also, did you have a chance to observe the JConsole graphs?   I have
attached a sample JConsole graph here.
(3) What JDK version are you using?
(4) GC for a Heap of that size may need some tuning.

You may start JConsole as follows on Linux/Unix.
a.Place the following script in your home-directory on the Tomcat
app-server machine .
b. Make sure that you have enabled Tomcat to allow JMX access.
c. Then log-in from a remote laptop/desktop into the server (using ssh),
and invoke this script from your home-directory:
--------start_jconsole.sh------
DISPLAY_NUMBER=`echo $DISPLAY | /usr/bin/perl -n -e 'm#localhost:(\d+).0#
&& print $1'`
echo $DISPLAY_NUMBER
XAUTH_ENTRY=`/usr/bin/xauth list | grep "unix:$DISPLAY_NUMBER"`
/usr/bin/jconsole
-J-Djava.class.path=/usr/lib/jvm/java/lib/jconsole.jar:/usr/lib/jvm/java/lib/tools.jar
-------------------------------


[image: Inline image 1]

Thanks.

                         -Shanti

On Tue, Oct 2, 2012 at 7:43 PM, Kirill Kireyev <kir...@instagrok.com> wrote:

>  Hi Shanti! Thanks for your script! I ran it (with minor modifications),
> and it turns out that the garbage collection threads are indeed taking up
> the majority of CPU (see below). What would you recommend I do to
> tune/troubleshoot GC further? My server has 32G or RAM, JVM has heap size
> of 26G
>
> Thanks,
> Kirill
>
> PID    CPU    Mem    JStack Info
> 0x253d    80    90.7    "GC task thread#3 (ParallelGC)" prio=10
> tid=0x0000000001938800 nid=0x253d runnable
> 0x2540    76    90.7    "GC task thread#6 (ParallelGC)" prio=10
> tid=0x000000000193e000 nid=0x2540 runnable
> 0x253b    74    90.7    "GC task thread#1 (ParallelGC)" prio=10
> tid=0x0000000001934800 nid=0x253b runnable
> 0x253e    74    90.7    "GC task thread#4 (ParallelGC)" prio=10
> tid=0x000000000193a000 nid=0x253e runnable
> 0x2541    74    90.7    "GC task thread#7 (ParallelGC)" prio=10
> tid=0x000000000193f800 nid=0x2541 runnable
> 0x253c    66    90.7    "GC task thread#2 (ParallelGC)" prio=10
> tid=0x0000000001936800 nid=0x253c runnable
> 0x253a    47    90.7    "GC task thread#0 (ParallelGC)" prio=10
> tid=0x0000000001933000 nid=0x253a runnable
> 0x253f    39    90.7    "GC task thread#5 (ParallelGC)" prio=10
> tid=0x000000000193c000 nid=0x253f runnable
> 0x2542    19    90.7    "VM Thread" prio=10 tid=0x00000000019b6000
> nid=0x2542 runnable
> 0x2539    0    90.7    "main" prio=10 tid=0x0000000001925000 nid=0x2539
> runnable [0x00007fee9d444000]
> 0x2543    0    90.7    "Reference Handler" daemon prio=10
> tid=0x00000000019bd800 nid=0x2543 runnable [0x00007fee500d8000]
> 0x2544    0    90.7    "Finalizer" daemon prio=10 tid=0x00000000019c5800
> nid=0x2544 in Object.wait() [0x00007fee4ffd7000]
> 0x2545    0    90.7    "Signal Dispatcher" daemon prio=10
> tid=0x0000000001a12000 nid=0x2545 runnable [0x0000000000000000]
> 0x2546    0    90.7    "C2 CompilerThread0" daemon prio=10
> tid=0x00007fee48001000 nid=0x2546 waiting on condition [0x0000000000000000]
> 0x2547    0    90.7    "C2 CompilerThread1" daemon prio=10
> tid=0x00007fee48004000 nid=0x2547 waiting on condition [0x0000000000000000]
> 0x2548    0    90.7    "Service Thread" daemon prio=10
> tid=0x0000000001a14800 nid=0x2548 runnable [0x0000000000000000]
> 0x2549    0    90.7    "VM Periodic Task Thread" prio=10
> tid=0x0000000001a1f800 nid=0x2549 waiting on condition
> 0x254a    0    90.7    "GC Daemon" daemon prio=10 tid=0x00007fee404b3000
> nid=0x254a in Object.wait() [0x00007fee4e1bd000]
> 0x254d    0    90.7
> "ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10
> tid=0x00007fee40389800 nid=0x254d runnable [0x00007fee4d820000]
> 0x254e    0    90.7    "http-apr-8080-Poller-0" daemon prio=10
> tid=0x00007fee40378000 nid=0x254e in Object.wait() [0x00007fee4d661000]
> 0x254f    0    90.7    "http-apr-8080-CometPoller-0" daemon prio=10
> tid=0x00007fee40395000 nid=0x254f in Object.wait() [0x00007fee4d4a2000]
> 0x2550    0    90.7    "http-apr-8080-Sendfile-0" daemon prio=10
> tid=0x00007fee40397000 nid=0x2550 in Object.wait() [0x00007fee4d2b0000]
> 0x2551    0    90.7    "http-apr-8080-Acceptor-0" daemon prio=10
> tid=0x00007fee4072c000 nid=0x2551 runnable [0x00007fee4d1af000]
> 0x2552    0    90.7    "http-apr-8080-AsyncTimeout" daemon prio=10
> tid=0x00007fee4072e000 nid=0x2552 sleeping[0x00007fee4d0ac000]
> 0x2553    0    90.7    "http-apr-8443-Poller-0" daemon prio=10
> tid=0x00007fee40730000 nid=0x2553 in Object.wait() [0x00007fee4cfab000]
> 0x2554    0    90.7    "http-apr-8443-CometPoller-0" daemon prio=10
> tid=0x00007fee40732000 nid=0x2554 in Object.wait() [0x00007fee4cdee000]
> 0x2555    0    90.7    "http-apr-8443-Acceptor-0" daemon prio=10
> tid=0x00007fee40734800 nid=0x2555 runnable [0x00007fee4cced000]
> 0x2556    0    90.7    "http-apr-8443-AsyncTimeout" daemon prio=10
> tid=0x00007fee4073b800 nid=0x2556 sleeping[0x00007fee4cbec000]
> 0x2557    0    90.7    "ajp-apr-8009-Poller-0" daemon prio=10
> tid=0x00007fee4073d800 nid=0x2557 runnable [0x00007fee4ca2f000]
> 0x2558    0    90.7    "ajp-apr-8009-CometPoller-0" daemon prio=10
> tid=0x00007fee4073f800 nid=0x2558 in Object.wait() [0x00007fee4c872000]
> 0x2559    0    90.7    "ajp-apr-8009-Acceptor-0" daemon prio=10
> tid=0x00007fee40741800 nid=0x2559 runnable [0x00007fee4c771000]
> 0x255a    0    90.7    "ajp-apr-8009-AsyncTimeout" daemon prio=10
> tid=0x00007fee40743800 nid=0x255a sleeping[0x00007fee4c670000]
> 0x255f    0    90.7    "ajp-apr-8009-exec-5" daemon prio=10
> tid=0x0000000001cef000 nid=0x255f runnable [0x00007fee4c157000]
> 0x2560    0    90.7    "ajp-apr-8009-exec-6" daemon prio=10
> tid=0x0000000001e93800 nid=0x2560 runnable [0x00007fee47ffd000]
> 0x2561    0    90.7    "ajp-apr-8009-exec-7" daemon prio=10
> tid=0x0000000001c1e800 nid=0x2561 runnable [0x00007fee4db39000]
> 0x2562    0    90.7    "ajp-apr-8009-exec-8" daemon prio=10
> tid=0x0000000001d41000 nid=0x2562 runnable [0x00007fee4dc3a000]
> 0x2563    0    90.7    "ajp-apr-8009-exec-9" daemon prio=10
> tid=0x0000000001bf9800 nid=0x2563 runnable [0x00007fee47efb000]
> 0x2564    0    90.7    "ajp-apr-8009-exec-10" daemon prio=10
> tid=0x0000000001cfe800 nid=0x2564 waiting on condition [0x00007fee47dfa000]
> 0x2586    0    90.7    "ajp-apr-8009-exec-12" daemon prio=10
> tid=0x0000000001e85000 nid=0x2586 waiting on condition [0x00007fee47bf6000]
> 0x2587    0    90.7    "pool-1-thread-1" prio=10 tid=0x0000000001fe4000
> nid=0x2587 waiting on condition [0x00007fee47af3000]
> 0x2635    0    90.7    "ajp-apr-8009-exec-15" daemon prio=10
> tid=0x000000000232d800 nid=0x2635 runnable [0x00007fee38991000]
> 0x2637    0    90.7    "ajp-apr-8009-exec-17" daemon prio=10
> tid=0x0000000002392800 nid=0x2637 waiting on condition [0x00007fee38790000]
> 0x2638    0    90.7    "ajp-apr-8009-exec-18" daemon prio=10
> tid=0x0000000002395000 nid=0x2638 waiting on condition [0x00007fee3868f000]
> 0x263a    0    90.7    "ajp-apr-8009-exec-20" daemon prio=10
> tid=0x0000000002312800 nid=0x263a waiting on condition [0x00007fee3848b000]
> 0x263d    0    90.7    "ajp-apr-8009-exec-23" daemon prio=10
> tid=0x0000000002324800 nid=0x263d runnable [0x00007fee38187000]
> 0x50b    0    90.7    "http-apr-8443-exec-1" daemon prio=10
> tid=0x0000000002209000 nid=0x50b waiting on condition [0x00007fee232b1000]
> 0x50c    0    90.7    "http-apr-8443-exec-2" daemon prio=10
> tid=0x000000000213c000 nid=0x50c waiting on condition [0x00007fee233b2000]
> 0x50d    0    90.7    "http-apr-8443-exec-3" daemon prio=10
> tid=0x000000000239e000 nid=0x50d waiting on condition [0x00007fee22baa000]
> 0x50e    0    90.7    "http-apr-8443-exec-4" daemon prio=10
> tid=0x0000000002aeb800 nid=0x50e waiting on condition [0x00007fee3939d000]
> 0x50f    0    90.7    "http-apr-8443-exec-5" daemon prio=10
> tid=0x0000000002ad3000 nid=0x50f waiting on condition [0x00007fee38f99000]
> 0x510    0    90.7    "http-apr-8443-exec-6" daemon prio=10
> tid=0x00000000025a1800 nid=0x510 waiting on condition [0x00007fee397a1000]
> 0x511    0    90.7    "http-apr-8443-exec-7" daemon prio=10
> tid=0x0000000002188800 nid=0x511 waiting on condition [0x00007fee45db8000]
> 0x512    0    90.7    "http-apr-8443-exec-8" daemon prio=10
> tid=0x00000000023b3800 nid=0x512 waiting on condition [0x00007fee44aa5000]
> 0x513    0    90.7    "http-apr-8443-exec-9" daemon prio=10
> tid=0x0000000002164800 nid=0x513 waiting on condition [0x00007fee3dae4000]
> 0x514    0    90.7    "http-apr-8443-exec-10" daemon prio=10
> tid=0x0000000002af6800 nid=0x514 waiting on condition [0x00007fee279f8000]
> 0x19a7    0    90.7    "http-apr-8080-exec-1" daemon prio=10
> tid=0x0000000001ac2800 nid=0x19a7 waiting on condition [0x00007fee3747b000]
> 0x19a8    0    90.7    "http-apr-8080-exec-2" daemon prio=10
> tid=0x000000000257d000 nid=0x19a8 waiting on condition [0x00007fee3767d000]
> 0x28b0    0    90.7    "http-apr-8080-exec-3" daemon prio=10
> tid=0x000000000274c000 nid=0x28b0 waiting on condition [0x00007fee3f8f7000]
> 0x28b1    0    90.7    "http-apr-8080-exec-4" daemon prio=10
> tid=0x00000000021c8800 nid=0x28b1 waiting on condition [0x00007fee3adb7000]
> 0x46e7    0    90.7    "Keep-Alive-Timer" daemon prio=10
> tid=0x0000000002160800 nid=0x46e7 runnable [0x00007fee3d5df000]
> 0x29c5    0    90.7    "Keep-Alive-SocketCleaner" daemon prio=10
> tid=0x000000000242b000 nid=0x29c5 in Object.wait() [0x00007fee1f877000]
> 0x6ef8    0    90.7    "http-apr-8080-exec-5" daemon prio=10
> tid=0x0000000002a56000 nid=0x6ef8 waiting on condition [0x00007fee454af000]
> 0x6ef9    0    90.7    "http-apr-8080-exec-6" daemon prio=10
> tid=0x00000000022de800 nid=0x6ef9 waiting on condition [0x00007fee45ab5000]
> 0x3d38    0    90.7    "http-apr-8080-exec-7" daemon prio=10
> tid=0x0000000002233000 nid=0x3d38 waiting on condition [0x00007fee37980000]
> 0x3d39    0    90.7    "http-apr-8080-exec-8" daemon prio=10
> tid=0x00000000021f7800 nid=0x3d39 waiting on condition [0x00007fee38087000]
> 0x64a3    0    90.7    "ajp-apr-8009-exec-36" daemon prio=10
> tid=0x00000000022dd000 nid=0x64a3 waiting on condition [0x00007fee35158000]
> 0x64a6    0    90.7    "ajp-apr-8009-exec-39" daemon prio=10
> tid=0x0000000002add800 nid=0x64a6 runnable [0x00007fee3bdc7000]
> 0x64a7    0    90.7    "ajp-apr-8009-exec-40" daemon prio=10
> tid=0x00000000029c0800 nid=0x64a7 waiting on condition [0x00007fee24fce000]
> 0x64ab    0    90.7    "ajp-apr-8009-exec-44" daemon prio=10
> tid=0x0000000002924000 nid=0x64ab runnable [0x00007fee20f8d000]
> 0x64ac    0    90.7    "ajp-apr-8009-exec-45" daemon prio=10
> tid=0x0000000002555800 nid=0x64ac waiting on condition [0x00007fee256d5000]
> 0x64af    0    90.7    "ajp-apr-8009-exec-48" daemon prio=10
> tid=0x0000000001f8a800 nid=0x64af waiting on condition [0x00007fee24ccb000]
> 0x64b0    0    90.7    "ajp-apr-8009-exec-49" daemon prio=10
> tid=0x000000000257b000 nid=0x64b0 runnable [0x00007fee36d73000]
> 0x64b1    0    90.7    "ajp-apr-8009-exec-50" daemon prio=10
> tid=0x0000000001bb3000 nid=0x64b1 runnable [0x00007fee36b71000]
> 0x64b2    0    90.7    "ajp-apr-8009-exec-51" daemon prio=10
> tid=0x000000000224a000 nid=0x64b2 runnable [0x00007fee270ee000]
> 0x64b3    0    90.7    "ajp-apr-8009-exec-52" daemon prio=10
> tid=0x000000000216e800 nid=0x64b3 runnable [0x00007fee20c8a000]
> 0xf1b    0    90.7    "http-apr-8080-exec-9" daemon prio=10
> tid=0x0000000001c9e800 nid=0xf1b waiting on condition [0x00007fee3929c000]
> 0xf1c    0    90.7    "http-apr-8080-exec-10" daemon prio=10
> tid=0x0000000001e4c000 nid=0xf1c waiting on condition [0x00007fee3d8e2000]
> 0x3eda    0    90.7    "ajp-apr-8009-exec-55" daemon prio=10
> tid=0x0000000002155000 nid=0x3eda runnable [0x00007fee27cfb000]
> 0x3edb    0    90.7    "ajp-apr-8009-exec-56" daemon prio=10
> tid=0x000000000232f800 nid=0x3edb runnable [0x00007fee27bfa000]
> 0x51c7    0    90.7    "ajp-apr-8009-exec-57" daemon prio=10
> tid=0x000000000211c800 nid=0x51c7 waiting on condition [0x00007fee37c83000]
>
>
>
> On 9/27/12 12:17 PM, Shanti Suresh wrote:
>
> Hi Kirill,
>
> Like Mark, Bill and Jeff said, those threads are normal request-processing
> threads.  I have included a script that might help with isolating high CPU
> issues with Tomcat.
>
> Also, I think it might be helpful to see how the Java heap is performing as
> well.
> Please bring up Jconsole and let it run over the week.  Inspect the graphs
> for Memory, CPU and threads.  Since you say that high CPU occurs
> intermittently several times during the week and clears itself, I wonder if
> it is somehow related with the garbage collection options you are using for
> the server.  Or it may be a code-related problem.
>
> Things to look at may include:
>
> (1) Are high CPU times related to Java heap reductions happening at the
> same time?  ==> GC possibly needs tuning
> (2) Are high CPU times related to increase in thread usage?  ==> possible
> livelock in looping code?
> (3) how many network connections come into the Tomcat server during
> high-CPU times?    Possible overload-related?
>
> Here is the script.  I made a couple of small changes, for e.g., changing
> the username.  But didn't test it after the change.  During high-CPU times,
> invoke the script a few times, say 30 seconds apart.  And then compare the
> thread-dumps.  I like to use TDA for thread-dump analysis of Tomcat
> thread-dumps.
>
> Mark, et al, please feel free to help me refine this script.  I would like
> to have a script to catch STUCK threads too :-)  Let me know if anyone has
> a script already.  Thanks.
>
> --------------high_cpu_diagnostics.pl:-----
> #!/usr/bin/perl
> #
>
> use Cwd;
>
> # Make a dated directory for capturing current diagnostics
> my ($sec,$min,$hour,$mday,$mon,$year,
>           $wday,$yday,$isdst) = localtime time;
> $year += 1900;
> $mon += 1;
> my $pwd = cwd();
> my $preview_diag_dir = "/tmp/Preview_Diag.$year-$mon-$mday-$hour:$min:$sec";
> print "$preview_diag_dir\n";
> mkdir $preview_diag_dir, 0755;
> chdir($preview_diag_dir) or die "Can't chdir into $preview_diag_dir $!\n";
>
> # Capture Preview thread dump
> my $process_pattern = "preview";
> my $preview_pid = `/usr/bin/pgrep -f $process_pattern`;
> my $login = getpwuid($<) ;
> if (kill 0, $preview_pid){
>         #Possible to send a signal to the Preview Tomcat - either "webinf"
> or "root"
>         my $count = kill 3, $preview_pid;
> }else {
>         # Not possible to send a signal to the VCM - use "sudo"
>         system ("/usr/bin/sudo /bin/kill -3 $preview_pid");
> }
>
> # Capture Preview thread dump
> system ("/usr/bin/jmap
> -dump:format=b,file=$preview_diag_dir/preview_heapdump.hprof $preview_pid");
>
> # Gather the top threads; keep around for reference on what other threads
> are running
> @top_cmd = ("/usr/bin/top",  "-H", "-n1", "-b");
> @sort_cmd = ("/bin/sort", "-r", "-n", "-k", "9,9");
> @sed_cmd = ("/bin/sed", "-n", "'8,$p'");
> system("@top_cmd 1> top_all_threads.log");
>
> # Get your tomcat user's threads, i.e. threads of user, "webinf"
> system('/usr/bin/tail -n+6 top_all_threads.log | /bin/sort -r -n -k "9,9" |
> /bin/grep webinf top_all_threads.log 1> top_user_webinf_threads.log');
>
> # Get the thread dump
> my @output=`/usr/bin/jstack -l ${preview_pid}`;
> open (my $file, '>', 'preview_threaddump.txt') or die "Could not open file:
> $!";
> print $file @output;
> close $file;
>
> open LOG, "top_user_webinf_threads.log" or die $!;
> open (STDOUT, "| tee -ai top_cpu_preview_threads.log");
> print "PID\tCPU\tMem\tJStack Info\n";
> while ($l = <LOG>) {
>     chop $l;
>     $pid = $l;
>     $pid =~ s/webinf.*//g;
>     $pid =~ s/ *//g;
> ##  Hex PID is available in the Sun HotSpot Stack Trace */
>     $hex_pid = sprintf("%#x", $pid);
>     @values = split(/\s+/, $l);
>     $pct = $values[8];
>     $mem = $values[9];
> # Debugger breakpoint:
> $DB::single = 1;
>
> # Find the Java thread that corresponds to the thread-id from the TOP output
>     for my $j (@output) {
>         chop $j;
>         ($j =~ m/nid=$hex_pid/)  &&  print $hex_pid . "\t" . $pct . "\t" .
> $mem . "\t" .  $j . "\n";
>     }
> }
>
> close (STDOUT);
>
> close LOG;
>
>
> ------------------end of script --------------------------
>
> Thanks.
>
>                       -Shanti
>
>
> On Thu, Sep 27, 2012 at 2:11 PM, Bill Miller 
> <millebi.subscripti...@gmail.com> wrote:
>
>
>  I agree; we have reproducible instances where PermGen is not set to our
> requirements on the Tomcat startup parameters and it will cause a "lockup"
> every time. Do some JMX monitoring and you may discover a memory spike
> that's killing Tomcat.
>
> Bill
> -----Original Message-----
> From: Jeff MAURY [mailto:jeffma...@gmail.com <jeffma...@gmail.com>]
> Sent: September-27-2012 2:01 PM
> To: Tomcat Users List
> Subject: Re: high CPU usage on tomcat 7
>
> This is probably due to out of memory, I have the same problem on my ubuntu
> ci machine Did you monitor your tomcat with jmx ?
>
> Jeff
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
>
>
>
> --
> *Kirill Kireyev, PhD*
> Founder/CTO instaGrok.com <http://www.instagrok.com>
> kir...@instagrok.com
> Twitter: @instaGrok <http://twitter.com/InstaGrok>
> FB: facebook.com/instagrok <http://www.facebook.com/instagrok>
>  <http://www.instagrok.com>
>

Reply via email to