Hi Shanti!

The heap size is because I start tomcat with -Xmx26000M option that I added. Here's the full command line:
/opt/jdk1.7.0_06//bin/java -Djava.util.logging.config.file=/opt/apache-tomcat-7.0.30/conf/logging.properties -Xmx26000M -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.endorsed.dirs=/opt/apache-tomcat-7.0.30/endorsed -classpath /opt/apache-tomcat-7.0.30/bin/bootstrap.jar:/opt/apache-tomcat-7.0.30/bin/tomcat-juli.jar -Dcatalina.base=/opt/apache-tomcat-7.0.30 -Dcatalina.home=/opt/apache-tomcat-7.0.30 -Djava.io.tmpdir=/opt/apache-tomcat-7.0.30/temp org.apache.catalina.startup.Bootstrap start

I will look into your other suggestions and reply shortly.

Thanks,
Kirill

On 10/3/12 9:07 AM, Shanti Suresh wrote:
Hi Kirill,

I am amazed at the size of the Java Heap of 26 GB.  I didn't expect that big a size.  Before I suggest anything else, please take a look at the filesystem space on "/tmp".  The script creates a heapdump everytime it runs as well.  So please make sure that "/tmp/" has sufficient storage.  Otherwise it may bring your service down.

(1) Could you please paste your JVM command-line arguments here?
(2) Also, did you have a chance to observe the JConsole graphs?   I have attached a sample JConsole graph here.
(3) What JDK version are you using?
(4) GC for a Heap of that size may need some tuning.

You may start JConsole as follows on Linux/Unix.
a.Place the following script in your home-directory on the Tomcat app-server machine .
b. Make sure that you have enabled Tomcat to allow JMX access.
c. Then log-in from a remote laptop/desktop into the server (using ssh), and invoke this script from your home-directory:
--------start_jconsole.sh------
DISPLAY_NUMBER=`echo $DISPLAY | /usr/bin/perl -n -e 'm#localhost:(\d+).0# && print $1'`
echo $DISPLAY_NUMBER
XAUTH_ENTRY=`/usr/bin/xauth list | grep "unix:$DISPLAY_NUMBER"`
/usr/bin/jconsole -J-Djava.class.path=/usr/lib/jvm/java/lib/jconsole.jar:/usr/lib/jvm/java/lib/tools.jar
-------------------------------


Inline image 1

Thanks.

                         -Shanti

On Tue, Oct 2, 2012 at 7:43 PM, Kirill Kireyev <kir...@instagrok.com> wrote:
Hi Shanti! Thanks for your script! I ran it (with minor modifications), and it turns out that the garbage collection threads are indeed taking up the majority of CPU (see below). What would you recommend I do to tune/troubleshoot GC further? My server has 32G or RAM, JVM has heap size of 26G

Thanks,
Kirill

PID    CPU    Mem    JStack Info
0x253d    80    90.7    "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000001938800 nid=0x253d runnable
0x2540    76    90.7    "GC task thread#6 (ParallelGC)" prio=10 tid=0x000000000193e000 nid=0x2540 runnable
0x253b    74    90.7    "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000001934800 nid=0x253b runnable
0x253e    74    90.7    "GC task thread#4 (ParallelGC)" prio=10 tid=0x000000000193a000 nid=0x253e runnable
0x2541    74    90.7    "GC task thread#7 (ParallelGC)" prio=10 tid=0x000000000193f800 nid=0x2541 runnable
0x253c    66    90.7    "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000001936800 nid=0x253c runnable
0x253a    47    90.7    "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000001933000 nid=0x253a runnable
0x253f    39    90.7    "GC task thread#5 (ParallelGC)" prio=10 tid=0x000000000193c000 nid=0x253f runnable
0x2542    19    90.7    "VM Thread" prio=10 tid=0x00000000019b6000 nid=0x2542 runnable
0x2539    0    90.7    "main" prio=10 tid=0x0000000001925000 nid=0x2539 runnable [0x00007fee9d444000]
0x2543    0    90.7    "Reference Handler" daemon prio=10 tid=0x00000000019bd800 nid=0x2543 runnable [0x00007fee500d8000]
0x2544    0    90.7    "Finalizer" daemon prio=10 tid=0x00000000019c5800 nid=0x2544 in Object.wait() [0x00007fee4ffd7000]
0x2545    0    90.7    "Signal Dispatcher" daemon prio=10 tid=0x0000000001a12000 nid=0x2545 runnable [0x0000000000000000]
0x2546    0    90.7    "C2 CompilerThread0" daemon prio=10 tid=0x00007fee48001000 nid=0x2546 waiting on condition [0x0000000000000000]
0x2547    0    90.7    "C2 CompilerThread1" daemon prio=10 tid=0x00007fee48004000 nid=0x2547 waiting on condition [0x0000000000000000]
0x2548    0    90.7    "Service Thread" daemon prio=10 tid=0x0000000001a14800 nid=0x2548 runnable [0x0000000000000000]
0x2549    0    90.7    "VM Periodic Task Thread" prio=10 tid=0x0000000001a1f800 nid=0x2549 waiting on condition
0x254a    0    90.7    "GC Daemon" daemon prio=10 tid=0x00007fee404b3000 nid=0x254a in Object.wait() [0x00007fee4e1bd000]
0x254d    0    90.7    "ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10 tid=0x00007fee40389800 nid=0x254d runnable [0x00007fee4d820000]
0x254e    0    90.7    "http-apr-8080-Poller-0" daemon prio=10 tid=0x00007fee40378000 nid=0x254e in Object.wait() [0x00007fee4d661000]
0x254f    0    90.7    "http-apr-8080-CometPoller-0" daemon prio=10 tid=0x00007fee40395000 nid=0x254f in Object.wait() [0x00007fee4d4a2000]
0x2550    0    90.7    "http-apr-8080-Sendfile-0" daemon prio=10 tid=0x00007fee40397000 nid=0x2550 in Object.wait() [0x00007fee4d2b0000]
0x2551    0    90.7    "http-apr-8080-Acceptor-0" daemon prio=10 tid=0x00007fee4072c000 nid=0x2551 runnable [0x00007fee4d1af000]
0x2552    0    90.7    "http-apr-8080-AsyncTimeout" daemon prio=10 tid=0x00007fee4072e000 nid=0x2552 sleeping[0x00007fee4d0ac000]
0x2553    0    90.7    "http-apr-8443-Poller-0" daemon prio=10 tid=0x00007fee40730000 nid=0x2553 in Object.wait() [0x00007fee4cfab000]
0x2554    0    90.7    "http-apr-8443-CometPoller-0" daemon prio=10 tid=0x00007fee40732000 nid=0x2554 in Object.wait() [0x00007fee4cdee000]
0x2555    0    90.7    "http-apr-8443-Acceptor-0" daemon prio=10 tid=0x00007fee40734800 nid=0x2555 runnable [0x00007fee4cced000]
0x2556    0    90.7    "http-apr-8443-AsyncTimeout" daemon prio=10 tid=0x00007fee4073b800 nid=0x2556 sleeping[0x00007fee4cbec000]
0x2557    0    90.7    "ajp-apr-8009-Poller-0" daemon prio=10 tid=0x00007fee4073d800 nid=0x2557 runnable [0x00007fee4ca2f000]
0x2558    0    90.7    "ajp-apr-8009-CometPoller-0" daemon prio=10 tid=0x00007fee4073f800 nid=0x2558 in Object.wait() [0x00007fee4c872000]
0x2559    0    90.7    "ajp-apr-8009-Acceptor-0" daemon prio=10 tid=0x00007fee40741800 nid=0x2559 runnable [0x00007fee4c771000]
0x255a    0    90.7    "ajp-apr-8009-AsyncTimeout" daemon prio=10 tid=0x00007fee40743800 nid=0x255a sleeping[0x00007fee4c670000]
0x255f    0    90.7    "ajp-apr-8009-exec-5" daemon prio=10 tid=0x0000000001cef000 nid=0x255f runnable [0x00007fee4c157000]
0x2560    0    90.7    "ajp-apr-8009-exec-6" daemon prio=10 tid=0x0000000001e93800 nid=0x2560 runnable [0x00007fee47ffd000]
0x2561    0    90.7    "ajp-apr-8009-exec-7" daemon prio=10 tid=0x0000000001c1e800 nid=0x2561 runnable [0x00007fee4db39000]
0x2562    0    90.7    "ajp-apr-8009-exec-8" daemon prio=10 tid=0x0000000001d41000 nid=0x2562 runnable [0x00007fee4dc3a000]
0x2563    0    90.7    "ajp-apr-8009-exec-9" daemon prio=10 tid=0x0000000001bf9800 nid=0x2563 runnable [0x00007fee47efb000]
0x2564    0    90.7    "ajp-apr-8009-exec-10" daemon prio=10 tid=0x0000000001cfe800 nid=0x2564 waiting on condition [0x00007fee47dfa000]
0x2586    0    90.7    "ajp-apr-8009-exec-12" daemon prio=10 tid=0x0000000001e85000 nid=0x2586 waiting on condition [0x00007fee47bf6000]
0x2587    0    90.7    "pool-1-thread-1" prio=10 tid=0x0000000001fe4000 nid=0x2587 waiting on condition [0x00007fee47af3000]
0x2635    0    90.7    "ajp-apr-8009-exec-15" daemon prio=10 tid=0x000000000232d800 nid=0x2635 runnable [0x00007fee38991000]
0x2637    0    90.7    "ajp-apr-8009-exec-17" daemon prio=10 tid=0x0000000002392800 nid=0x2637 waiting on condition [0x00007fee38790000]
0x2638    0    90.7    "ajp-apr-8009-exec-18" daemon prio=10 tid=0x0000000002395000 nid=0x2638 waiting on condition [0x00007fee3868f000]
0x263a    0    90.7    "ajp-apr-8009-exec-20" daemon prio=10 tid=0x0000000002312800 nid=0x263a waiting on condition [0x00007fee3848b000]
0x263d    0    90.7    "ajp-apr-8009-exec-23" daemon prio=10 tid=0x0000000002324800 nid=0x263d runnable [0x00007fee38187000]
0x50b    0    90.7    "http-apr-8443-exec-1" daemon prio=10 tid=0x0000000002209000 nid=0x50b waiting on condition [0x00007fee232b1000]
0x50c    0    90.7    "http-apr-8443-exec-2" daemon prio=10 tid=0x000000000213c000 nid=0x50c waiting on condition [0x00007fee233b2000]
0x50d    0    90.7    "http-apr-8443-exec-3" daemon prio=10 tid=0x000000000239e000 nid=0x50d waiting on condition [0x00007fee22baa000]
0x50e    0    90.7    "http-apr-8443-exec-4" daemon prio=10 tid=0x0000000002aeb800 nid=0x50e waiting on condition [0x00007fee3939d000]
0x50f    0    90.7    "http-apr-8443-exec-5" daemon prio=10 tid=0x0000000002ad3000 nid=0x50f waiting on condition [0x00007fee38f99000]
0x510    0    90.7    "http-apr-8443-exec-6" daemon prio=10 tid=0x00000000025a1800 nid=0x510 waiting on condition [0x00007fee397a1000]
0x511    0    90.7    "http-apr-8443-exec-7" daemon prio=10 tid=0x0000000002188800 nid=0x511 waiting on condition [0x00007fee45db8000]
0x512    0    90.7    "http-apr-8443-exec-8" daemon prio=10 tid=0x00000000023b3800 nid=0x512 waiting on condition [0x00007fee44aa5000]
0x513    0    90.7    "http-apr-8443-exec-9" daemon prio=10 tid=0x0000000002164800 nid=0x513 waiting on condition [0x00007fee3dae4000]
0x514    0    90.7    "http-apr-8443-exec-10" daemon prio=10 tid=0x0000000002af6800 nid=0x514 waiting on condition [0x00007fee279f8000]
0x19a7    0    90.7    "http-apr-8080-exec-1" daemon prio=10 tid=0x0000000001ac2800 nid=0x19a7 waiting on condition [0x00007fee3747b000]
0x19a8    0    90.7    "http-apr-8080-exec-2" daemon prio=10 tid=0x000000000257d000 nid=0x19a8 waiting on condition [0x00007fee3767d000]
0x28b0    0    90.7    "http-apr-8080-exec-3" daemon prio=10 tid=0x000000000274c000 nid=0x28b0 waiting on condition [0x00007fee3f8f7000]
0x28b1    0    90.7    "http-apr-8080-exec-4" daemon prio=10 tid=0x00000000021c8800 nid=0x28b1 waiting on condition [0x00007fee3adb7000]
0x46e7    0    90.7    "Keep-Alive-Timer" daemon prio=10 tid=0x0000000002160800 nid=0x46e7 runnable [0x00007fee3d5df000]
0x29c5    0    90.7    "Keep-Alive-SocketCleaner" daemon prio=10 tid=0x000000000242b000 nid=0x29c5 in Object.wait() [0x00007fee1f877000]
0x6ef8    0    90.7    "http-apr-8080-exec-5" daemon prio=10 tid=0x0000000002a56000 nid=0x6ef8 waiting on condition [0x00007fee454af000]
0x6ef9    0    90.7    "http-apr-8080-exec-6" daemon prio=10 tid=0x00000000022de800 nid=0x6ef9 waiting on condition [0x00007fee45ab5000]
0x3d38    0    90.7    "http-apr-8080-exec-7" daemon prio=10 tid=0x0000000002233000 nid=0x3d38 waiting on condition [0x00007fee37980000]
0x3d39    0    90.7    "http-apr-8080-exec-8" daemon prio=10 tid=0x00000000021f7800 nid=0x3d39 waiting on condition [0x00007fee38087000]
0x64a3    0    90.7    "ajp-apr-8009-exec-36" daemon prio=10 tid=0x00000000022dd000 nid=0x64a3 waiting on condition [0x00007fee35158000]
0x64a6    0    90.7    "ajp-apr-8009-exec-39" daemon prio=10 tid=0x0000000002add800 nid=0x64a6 runnable [0x00007fee3bdc7000]
0x64a7    0    90.7    "ajp-apr-8009-exec-40" daemon prio=10 tid=0x00000000029c0800 nid=0x64a7 waiting on condition [0x00007fee24fce000]
0x64ab    0    90.7    "ajp-apr-8009-exec-44" daemon prio=10 tid=0x0000000002924000 nid=0x64ab runnable [0x00007fee20f8d000]
0x64ac    0    90.7    "ajp-apr-8009-exec-45" daemon prio=10 tid=0x0000000002555800 nid=0x64ac waiting on condition [0x00007fee256d5000]
0x64af    0    90.7    "ajp-apr-8009-exec-48" daemon prio=10 tid=0x0000000001f8a800 nid=0x64af waiting on condition [0x00007fee24ccb000]
0x64b0    0    90.7    "ajp-apr-8009-exec-49" daemon prio=10 tid=0x000000000257b000 nid=0x64b0 runnable [0x00007fee36d73000]
0x64b1    0    90.7    "ajp-apr-8009-exec-50" daemon prio=10 tid=0x0000000001bb3000 nid=0x64b1 runnable [0x00007fee36b71000]
0x64b2    0    90.7    "ajp-apr-8009-exec-51" daemon prio=10 tid=0x000000000224a000 nid=0x64b2 runnable [0x00007fee270ee000]
0x64b3    0    90.7    "ajp-apr-8009-exec-52" daemon prio=10 tid=0x000000000216e800 nid=0x64b3 runnable [0x00007fee20c8a000]
0xf1b    0    90.7    "http-apr-8080-exec-9" daemon prio=10 tid=0x0000000001c9e800 nid=0xf1b waiting on condition [0x00007fee3929c000]
0xf1c    0    90.7    "http-apr-8080-exec-10" daemon prio=10 tid=0x0000000001e4c000 nid=0xf1c waiting on condition [0x00007fee3d8e2000]
0x3eda    0    90.7    "ajp-apr-8009-exec-55" daemon prio=10 tid=0x0000000002155000 nid=0x3eda runnable [0x00007fee27cfb000]
0x3edb    0    90.7    "ajp-apr-8009-exec-56" daemon prio=10 tid=0x000000000232f800 nid=0x3edb runnable [0x00007fee27bfa000]
0x51c7    0    90.7    "ajp-apr-8009-exec-57" daemon prio=10 tid=0x000000000211c800 nid=0x51c7 waiting on condition [0x00007fee37c83000]



On 9/27/12 12:17 PM, Shanti Suresh wrote:
Hi Kirill,

Like Mark, Bill and Jeff said, those threads are normal request-processing
threads.  I have included a script that might help with isolating high CPU
issues with Tomcat.

Also, I think it might be helpful to see how the Java heap is performing as
well.
Please bring up Jconsole and let it run over the week.  Inspect the graphs
for Memory, CPU and threads.  Since you say that high CPU occurs
intermittently several times during the week and clears itself, I wonder if
it is somehow related with the garbage collection options you are using for
the server.  Or it may be a code-related problem.

Things to look at may include:

(1) Are high CPU times related to Java heap reductions happening at the
same time?  ==> GC possibly needs tuning
(2) Are high CPU times related to increase in thread usage?  ==> possible
livelock in looping code?
(3) how many network connections come into the Tomcat server during
high-CPU times?    Possible overload-related?

Here is the script.  I made a couple of small changes, for e.g., changing
the username.  But didn't test it after the change.  During high-CPU times,
invoke the script a few times, say 30 seconds apart.  And then compare the
thread-dumps.  I like to use TDA for thread-dump analysis of Tomcat
thread-dumps.

Mark, et al, please feel free to help me refine this script.  I would like
to have a script to catch STUCK threads too :-)  Let me know if anyone has
a script already.  Thanks.

--------------high_cpu_diagnostics.pl:-----
#!/usr/bin/perl
#

use Cwd;

# Make a dated directory for capturing current diagnostics
my ($sec,$min,$hour,$mday,$mon,$year,
          $wday,$yday,$isdst) = localtime time;
$year += 1900;
$mon += 1;
my $pwd = cwd();
my $preview_diag_dir = "/tmp/Preview_Diag.$year-$mon-$mday-$hour:$min:$sec";
print "$preview_diag_dir\n";
mkdir $preview_diag_dir, 0755;
chdir($preview_diag_dir) or die "Can't chdir into $preview_diag_dir $!\n";

# Capture Preview thread dump
my $process_pattern = "preview";
my $preview_pid = `/usr/bin/pgrep -f $process_pattern`;
my $login = getpwuid($<) ;
if (kill 0, $preview_pid){
        #Possible to send a signal to the Preview Tomcat - either "webinf"
or "root"
        my $count = kill 3, $preview_pid;
}else {
        # Not possible to send a signal to the VCM - use "sudo"
        system ("/usr/bin/sudo /bin/kill -3 $preview_pid");
}

# Capture Preview thread dump
system ("/usr/bin/jmap
-dump:format=b,file=$preview_diag_dir/preview_heapdump.hprof $preview_pid");

# Gather the top threads; keep around for reference on what other threads
are running
@top_cmd = ("/usr/bin/top",  "-H", "-n1", "-b");
@sort_cmd = ("/bin/sort", "-r", "-n", "-k", "9,9");
@sed_cmd = ("/bin/sed", "-n", "'8,$p'");
system("@top_cmd 1> top_all_threads.log");

# Get your tomcat user's threads, i.e. threads of user, "webinf"
system('/usr/bin/tail -n+6 top_all_threads.log | /bin/sort -r -n -k "9,9" |
/bin/grep webinf top_all_threads.log 1> top_user_webinf_threads.log');

# Get the thread dump
my @output=`/usr/bin/jstack -l ${preview_pid}`;
open (my $file, '>', 'preview_threaddump.txt') or die "Could not open file:
$!";
print $file @output;
close $file;

open LOG, "top_user_webinf_threads.log" or die $!;
open (STDOUT, "| tee -ai top_cpu_preview_threads.log");
print "PID\tCPU\tMem\tJStack Info\n";
while ($l = <LOG>) {
    chop $l;
    $pid = $l;
    $pid =~ s/webinf.*//g;
    $pid =~ s/ *//g;
##  Hex PID is available in the Sun HotSpot Stack Trace */
    $hex_pid = sprintf("%#x", $pid);
    @values = split(/\s+/, $l);
    $pct = $values[8];
    $mem = $values[9];
# Debugger breakpoint:
$DB::single = 1;

# Find the Java thread that corresponds to the thread-id from the TOP output
    for my $j (@output) {
        chop $j;
        ($j =~ m/nid=$hex_pid/)  &&  print $hex_pid . "\t" . $pct . "\t" .
$mem . "\t" .  $j . "\n";
    }
}

close (STDOUT);

close LOG;


------------------end of script --------------------------

Thanks.

                      -Shanti


On Thu, Sep 27, 2012 at 2:11 PM, Bill Miller <
millebi.subscripti...@gmail.com> wrote:

I agree; we have reproducible instances where PermGen is not set to our
requirements on the Tomcat startup parameters and it will cause a "lockup"
every time. Do some JMX monitoring and you may discover a memory spike
that's killing Tomcat.

Bill
-----Original Message-----
From: Jeff MAURY [mailto:jeffma...@gmail.com]
Sent: September-27-2012 2:01 PM
To: Tomcat Users List
Subject: Re: high CPU usage on tomcat 7

This is probably due to out of memory, I have the same problem on my ubuntu
ci machine Did you monitor your tomcat with jmx ?

Jeff


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




--
Kirill Kireyev, PhD
Founder/CTO instaGrok.com
kir...@instagrok.com
Twitter: @instaGrok
FB: facebook.com/instagrok



--
Kirill Kireyev, PhD
Founder/CTO instaGrok.com
kir...@instagrok.com
Twitter: @instaGrok
FB: facebook.com/instagrok

Reply via email to