RE: Please help with Tomcat Garbage Collection

John.E.Gregg Wed, 16 Nov 2016 08:35:10 -0800

> -----Original Message-----
> From: George I. Develekos [mailto:gdevele...@omilia.com]
> Sent: Wednesday, November 16, 2016 10:18 AM
> To: users@tomcat.apache.org
> Subject: Re: Please help with Tomcat Garbage Collection
> 
> I appreciate the detailed response.
> 
> On another installation with higher  load, the JVM has "selected" to give
> YoungGen 250MB or so (as opposed to 150M here), and I have confirmed that
> Full-GC is much less frequent so I'll go with that next time I have downtime.
> 
> When exactly can I expect to see my app freeze? Isn't it during Full-GC?
> 
> Does the increase in CPU during those Full-GC times make sense?
> 
> 
> 
> Regards,
> 
> George I. Develekos | SeniorSW Engineer |t: _+30.210.6930664_| e:
> gdevele...@omilia.com <mailto:gdevele...@omilia.com>
> 
> ------------------------------------------------------------------------
> 
> Technology that Listens, Understands and Cares   - www.omilia.com
> <http://www.omilia.com/>
> 
> On 11/16/2016 6:11 PM, john.e.gr...@wellsfargo.com wrote:
> >
> > Sorry for top posting.  The format got weird.
> >
> > Those numbers aren’t bad.  The most important number for me is the
> > throughput on the summary tab.  Yours is 99.38%.  That means the JVM
> > was doing real work 99.38% of the time and garbage collecting the
> > other .62%.  You could improve that if you worked hard enough, but
> > going from 99.4 to 99.5 or 99.6 probably isn’t worth the effort.  If
> > the number was 90% or something, then you’d have more room for
> > improvement.
> >
> > Look at the GC performance numbers on the summary tab.  Obviously
> > minor GCs are much faster in this regard than major GCs. You can
> > reduce your total GC time by increasing the size of your young
> > generation.  You will get more or slower young collections but fewer
> > and faster old collections.  Overall the total time will be less than
> > it is now and the longest pauses will be shorter.
> >
> > As others have said, though, something doesn’t add up.  CMS is only
> > stop-the-world during certain phases.  (Not the ones with “concurrent”
> > in the name.)  If you feel these GC events coincide with pauses in
> > your app, you can try a thread dump or three (kill -3 <pid>) during
> > the pause.  Use a tool like Samurai to parse the output.  This might
> > only be practical for longer pauses, though.
> >
> > Also, is it possible the VM itself is having a problem?  Maybe you
> > should talk to your virtualization team to see how stressed the
> > hardware is.  VMWare has an informative java best practices doc:
> >
> http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techp
> > aper/enterprise-java-applications-on-vmware-best-practices-guide.pdf
> >
> > John
> >
> > *From:*George I. Develekos [mailto:gdevele...@omilia.com]
> > *Sent:* Wednesday, November 16, 2016 5:21 AM
> > *To:* users@tomcat.apache.org
> > *Subject:* Re: Please help with Tomcat Garbage Collection
> >
> > I'm attaching three screenshots of the GCViewer app as it processed
> > the complete *gc.log* file (about 19 hours).
> >
> > Please have a look and advise on what I can do to limit Full-GC times.
> > As of now I have a recommendation to increase the Young Gen..
> >
> > The setup in summary:
> >
> > We are using Java 6 (stuck with CentOS 5.8 at this time) and Tomcat
> > 7.0.64.
> >
> > Xmx is 5G, Xms is 2G, and GC options are -XX:+UseConcMarkSweepGC
> > -XX:+CMSIncrementalMode
> >
> >
> >
> >
> >
> >
> > On 11/15/2016 11:45 PM, john.e.gr...@wellsfargo.com
> > <mailto:john.e.gr...@wellsfargo.com> wrote:
> >
> >         -----Original Message-----
> >
> >         From: George I. Develekos [mailto:gdevele...@omilia.com]
> >
> >         Sent: Tuesday, November 15, 2016 3:00 PM
> >
> >         To:users@tomcat.apache.org <mailto:users@tomcat.apache.org>
> >
> >         Subject: Re: Please help with Tomcat Garbage Collection
> >
> >         The system does very little swapping, both when it's GC'ing and 
> > when it's
> not.
> >
> >         Less than 100MB worth of swap is taken.
> >
> >         Giving Tomcat its own HW is not an option at this time,
> > especially as there's no
> >
> >         guarantee it'll solve the problem. Besides it would be a VM
> > anyway, not physical
> >
> >         dedicated HW.  The current server is also a VM.
> >
> >         On 15-Nov-16 10:55 PM, Zdeněk Henek wrote:
> >
> >             I would start with moving this tomcat to its own hw.
> >
> >             Did you check swap? This long pauses could be because part
> > of your
> >
> >             heap is swapped to hdd
> >
> >             Regards,
> >
> >             Zdenek Henek
> >
> >             On Tue, Nov 15, 2016, 21:37 George I. Develekos
> >
> >             <gdevele...@omilia.com> <mailto:gdevele...@omilia.com>
> >
> >             wrote:
> >
> >                 On 15-Nov-16 10:22 PM, Christopher Schultz wrote:
> >
> >                     George,
> >
> >                     On 11/15/16 10:46 AM, George I. Develekos wrote:
> >
> >                         Hello guys,
> >
> >                         We are having problems on a production system
> > with very long "full
> >
> >                         GC" times, as long as1200sec real time (!!!).
> >
> >                         We are using Java 6 (stuck with CentOS 5.8 at
> > this time) and Tomcat
> >
> >                         7.0.64.
> >
> >                         Xmx is 5G, Xms is 2G, and GC options are
> > -XX:+UseConcMarkSweepGC
> >
> >                         -XX:+CMSIncrementalMode
> >
> >                         No other custom memory-related settings are in 
> > place.
> >
> >                         Looking at the GC log, the last few Full-GC entries 
> > are:
> >
> >                         1367.020: [Full GC 1367.020: [CMS:
> > 1178831K->527456K(1926784K),
> >
> >                         2.1117220 secs] 1250378K->527456K(2080128K), [CMS 
> > Perm :
> >
> >                         169762K->56187K(169984K)] icms_dc=0 , 2.1118160 
> > secs]
> [Times:
> >
> >                         user=1.96 sys=0.13, real=2.11 secs]
> >
> >                         2579.317: [Full GC 2579.317: [CMS2581.876: 
> > [CMS-concurrent-
> mark:
> >
> >                         2.558/1212.733 secs] [*Times: user=113.05
> > sys=28.01,
> >
> >                         real=**1212.49 **secs] ** * 3539.969: [Full GC 
> > 3539.969:
> >
> >                         [CMS3540.056: [CMS-concurrent-sweep: 1.571/23.223 
> > secs]
> [Times:
> >
> >                         user=6.12 sys=1.36, real=*23.21 secs*]
> >
> >                         4070.456: [Full GC 4070.457: [CMS:
> > 1252569K->591200K(1926784K),
> >
> >                         2.3447040 secs] 1270617K->591200K(2080128K), [CMS 
> > Perm :
> >
> >                         169983K->56598K(169984K)] icms_dc=0 , 2.3448140 
> > secs]
> [Times:
> >
> >                         user=2.18 sys=0.14, real=2.34 secs]
> >
> >                         What can we do?
> >
> >                     1367.020 Full GC duration=2.11s
> >
> >                     2579.317 Full GC duration=1212.49s
> >
> >                     So your full GC immediately started another full
> > GC that took 20
> >
> >                     minutes ?
> >
> >                     Are you only showing certain FULL GC activity from
> > your log, or is
> >
> >                     that everything?
> >
> >                     CMS should have a mark and then a sweep each time,
> > but your times
> >
> >                     don't seem to add up.
> >
> >                     also note that the whole point of CMS is that
> > there isn't any
> >
> >                     stop-the-world during the mark portion of the process.
> >
> >                     Are you actually experiencing a problem, or are
> > you just suffering
> >
> >                     from instrumentor's remorse?
> >
> >                     - -chris
> >
> >                 Chris,
> >
> >                 What I listed is the result of the command:
> >
> >                 grep "Full GC" gc.log
> >
> >                 So (obviously) I have skipped other GC activity, i.e.
> > whatever GC
> >
> >                 activity didn't include the "Full GC" string.
> >
> >                 Yes we are having app trouble due to the GC delays so
> > this is a real
> >
> >                 problem. Our application has real-time constraints so
> > the GC delays
> >
> >                 cannot be tolerated. I selected those GC options _in
> > order to avoid
> >
> >                 _long GC times.
> >
> >                 Additionally, these periods coincide with high CPU for
> > that JVM
> >
> >                 process.  From 5-20% CPU where it is normally, it jumps to 
> > 60% ore
> more.
> >
> >                 Once GC is done, our app rushes to catch up with tasks
> > that had to
> >
> >                 wait for GC to finish.
> >
> >                 Answering another question from a member who has
> > kindly responded,
> >
> >                 yes the server is running other stuff. Basically it
> > runs three
> >
> >                 tomcats, the main one being this one. It also runs a
> > DB2 database
> >
> >                 that has close-to-zero activity.
> >
> >                 George
> >
> >     It might be helpful if you could post a larger chunk of your GC log, at 
> > least
> long enough to cover the start and end of the CMS phases and maybe even
> more.  Additionally, try using a tool like GCViewer to analyze the log.
> >
> >     How many CPUs do you have?  60% CPU usage isn't usually a big deal.
> >
> >     Like Chris already said, this is not a stop-the-world phase, so your
> application should continue working in parallel with the garbage collector.
> >
> >     Looks like your young generation is only 150MB (2080128k - 1926784k.)
> That's very small for a 2-5GB heap.  Are you explicitly setting it somewhere 
> or is
> the JVM choosing that for you?  It's so small that your old generation might 
> be
> filling up faster that it should, leading to more frequent full collections.  
> You
> could try setting the young generation to something like 25-50% of the total
> heap.  You'd get a lot of small pauses as the young gen is collected but fewer
> long ones.
> >
> >     John
> >
> >
> > ---------------------------------------------------------------------
> >
> >     To unsubscribe, e-mail:users-unsubscr...@tomcat.apache.org
> >     <mailto:users-unsubscr...@tomcat.apache.org>
> >
> >     For additional commands, e-mail:users-h...@tomcat.apache.org
> > <mailto:users-h...@tomcat.apache.org>
> >
> >
> >
> > ----------------------------------------------------------------------
> > --
> >
> > Avast logo
> > <https://www.avast.com/sig-
> email?utm_medium=email&utm_source=link&utm_
> > campaign=sig-email&utm_content=emailclient>
> >
> >
> >
> > This email has been checked for viruses by Avast antivirus software.
> > www.avast.com
> > <https://www.avast.com/sig-
> email?utm_medium=email&utm_source=link&utm_
> > campaign=sig-email&utm_content=emailclient>
> >
> >
> 
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus

CMS consists of phases.  It's not just one monolithic activity.  The 
stop-the-world pauses are the initial-mark and remark phases.  These are 
identified in the log as CMS-initial-mark and CMS-remark.  (All young GCs are 
stop-the-world, too, by the way.)  Otherwise your app should continue to run 
concurrently with GC.  That's what's nice about CMS: technically the 
collections take longer than with the parallel collector, but the app continues 
to work normally during most of that time.

There probably is an increase in CPU during that time, but the collections are 
so short you might not notice.  You might notice an increase in response time, 
but it could be hard to separate the impact of GC from the rest of the noise in 
the system (response time variability in backends, etc.)

John
RE: Please help with Tomcat Garbage Collection

Reply via email to