Hi all, a few of you also replied by private mail, thanks. Supplying the -Xloggc option worked, although I don't understand where the info went before, and the log rotation option doesn't rotate anything but just truncates the log file on every JVM restart, which sucks. But ok, I have some data.
Below you find the log of one session. Users reported bad response times that correspond with the Full GC runs, but also very bad response times from 17:00 onward where there is often no GC activity logged at the times that users experienced waiting time. So I'm not entirely sure if this is all caused by GC. Anyway, heap sizes of up to 60 GByte are apparently not so common, at least not with WebObjects. I found a helpful article [1] about very large JVM heaps, including the hint to the Zing Azul JVM, which features a stopless garbage collector [3], but costs $8000 per server and year. In the end, we may actually be trying that if other options fail, but first I need to understand more about what's in memory. I'm currently approaching that question via JProfiler, especially to find out if that's all snapshot cache (how would I limit that in size?) or some other homemade memory leak. Does anyone have experience with the combination of Azul and WO? I also found one strange event today where the application appeared stuck, but continued after a minute or two, while the gc log file was truncated at that time, without the JVM restarting. I can't see how that's supposed to happen. Here's how that looked in my shell running "tail -F gc.log": [PSYoungGen: 15256082K->3431947K(16962048K)] 25580475K->13801928K(27895808K), 1.1505415 secs] [Times: user=6.00 sys=0.03, real=1.15 secs] 2016-01-26T12:11:38.640+0100: 26136.407: [GC (Allocation Failure) Desired survivor size 4679794688 bytes, new threshold 15 (max 15) [PSYoungGen: 15791627K->3414349K(16958976K)] 26161608K->13935770K(27892736K), 1.0761646 secs] [Times: user=6.10 sys=0.17, real=1.08 secs] 2016-01-26T12:16:10.835+0100: 26408.603: [GC (Allocation Failure) Desired survivor size 4657250304 bytes, new threshold 15 (max 15) [PSYoungGen: 15774029K->3449938K(17397760K)] 26295450K->14128338K(28331520K), 1.3285101 secs] [Times: user=6.24 sys=1.39, real=1.33 secs] 2016-01-26T12:16:12.164+0100: 26409.931: [Full GC (Ergonomics) [PSYoungGen: 3449938K->2355719K(17397760K)] [ParOldGen: 10678400K->10933508K(13649408K)] 14128338K->13289227K(31047168K), [Metaspace: 84345K->84345K(94208K)], 12.1170523 secs] [Times: user=67.98 sys=1.03, real=12.11 secs] 2016-01-26T12:21:18.990+0100: 26716.757: [GC (Allocation Failure) Desired survivor size 4644667392 bytes, new threshold 15 (max 15) [PSYoungGen: 15218183K->2398851K(17410560K)] 26151691K->13458637K(31059968K), 1.3389149 secs] [Times: user=8.72 sys=0.17, real=1.34 secs] tail: gc.log: file truncated Java HotSpot(TM) 64-Bit Server VM (25.66-b17) for linux-amd64 JRE (1.8.0_66-b17), built on Oct 6 2015 17:28:34 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8) Memory: 4k page, physical 99008820k(13655516k free), swap 67108860k(66990156k free) CommandLine flags: -XX:InitialHeapSize=1584141120 -XX:+ManagementServer -XX:MaxHeapSize=90194313216 -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:-UseGCLogFileRotation -XX:+UseParallelGC 2016-01-26T12:33:36.029+0100: 27453.796: [GC (Allocation Failure) Desired survivor size 4588568576 bytes, new threshold 15 (max 15) [PSYoungGen: 15261315K->2538940K(17720832K)] 26321101K->13670347K(31370240K), 0.7936427 secs] [Times: user=4.62 sys=0.03, real=0.80 secs] 2016-01-26T12:39:29.156+0100: 27806.923: [GC (Allocation Failure) And here's the above-mentioned full gc log of 4 hours during the most critical usage time where users experienced waits. Notable wait times according to user's notes occurred at least at the following times: 15:07, 16:05, 16:57, 17:00-17:35, 17:59. 2016-01-25T15:06:54.344+0100: 36634.642: [GC (Allocation Failure) [PSYoungGen: 18701268K->2058545K(23079424K)] 33613120K->17079754K(41686016K), 0.8635088 secs] [Times: user=4.22 sys=0.50, real=0.86 secs] 2016-01-25T15:15:48.438+0100: 37168.736: [GC (Allocation Failure) [PSYoungGen: 19683633K->2231668K(23157248K)] 34704842K->17370085K(41763840K), 0.9898285 secs] [Times: user=4.98 sys=0.05, real=0.99 secs] 2016-01-25T15:24:54.891+0100: 37715.189: [GC (Allocation Failure) [PSYoungGen: 19856756K->2254238K(23623168K)] 34995173K->17626274K(42229760K), 1.0030974 secs] [Times: user=4.91 sys=0.30, real=1.00 secs] 2016-01-25T15:33:58.784+0100: 38259.082: [GC (Allocation Failure) [PSYoungGen: 20552606K->2502793K(23740416K)] 35924642K->17914660K(42347008K), 1.0203825 secs] [Times: user=5.32 sys=0.24, real=1.02 secs] 2016-01-25T15:43:12.959+0100: 38813.256: [GC (Allocation Failure) [PSYoungGen: 20801161K->2677482K(23608320K)] 36213028K->18138909K(42214912K), 1.2300001 secs] [Times: user=5.53 sys=0.22, real=1.23 secs] 2016-01-25T15:51:16.686+0100: 39296.983: [GC (Allocation Failure) [PSYoungGen: 21276394K->2800542K(23760384K)] 36737821K->18411609K(42366976K), 1.1514720 secs] [Times: user=6.19 sys=0.15, real=1.15 secs] 2016-01-25T15:58:35.696+0100: 39735.994: [GC (Allocation Failure) [PSYoungGen: 21399454K->2938402K(23849472K)] 37010521K->18700024K(42456064K), 2.8563665 secs] [Times: user=6.23 sys=8.99, real=2.86 secs] 2016-01-25T16:05:33.310+0100: 40153.607: [GC (System.gc()) [PSYoungGen: 20526288K->2986997K(22197760K)] 36287910K->18889778K(40804352K), 1.2626979 secs] [Times: user=6.22 sys=1.00, real=1.27 secs] 2016-01-25T16:05:34.572+0100: 40154.870: [Full GC (System.gc()) [PSYoungGen: 2986997K->0K(22197760K)] [ParOldGen: 15902781K->16935926K(18606592K)] 18889778K->16935926K(40804352K), [Metaspace: 90226K->90226K(100352K)], 15.0685505 secs] [Times: user=86.67 sys=7.55, real=15.06 secs] 2016-01-25T16:13:35.766+0100: 40636.064: [GC (Allocation Failure) [PSYoungGen: 19210752K->419289K(23559168K)] 36146678K->17355224K(42165760K), 0.5276242 secs] [Times: user=2.41 sys=0.00, real=0.53 secs] 2016-01-25T16:22:46.735+0100: 41187.033: [GC (Allocation Failure) [PSYoungGen: 19353561K->740457K(23835648K)] 36289496K->17676391K(42442240K), 0.6634034 secs] [Times: user=2.52 sys=0.00, real=0.66 secs] 2016-01-25T16:30:31.503+0100: 41651.801: [GC (Allocation Failure) [PSYoungGen: 19674729K->1023572K(24313856K)] 36610663K->17959507K(42920448K), 0.8099960 secs] [Times: user=2.72 sys=0.08, real=0.81 secs] 2016-01-25T16:37:51.095+0100: 42091.393: [GC (Allocation Failure) [PSYoungGen: 20311636K->1263580K(20551680K)] 37247571K->18199515K(39158272K), 0.7992434 secs] [Times: user=3.32 sys=0.01, real=0.80 secs] 2016-01-25T16:49:34.671+0100: 42794.969: [GC (Allocation Failure) [PSYoungGen: 20551644K->1565656K(24408576K)] 37487579K->18501598K(43015168K), 0.8773248 secs] [Times: user=3.74 sys=0.00, real=0.87 secs] 2016-01-25T16:56:34.541+0100: 43214.839: [GC (Allocation Failure) [PSYoungGen: 21011416K->1791681K(21237760K)] 37947358K->18727623K(39844352K), 0.7816826 secs] [Times: user=4.03 sys=0.00, real=0.79 secs] 2016-01-25T17:01:31.494+0100: 43511.792: [GC (Allocation Failure) [PSYoungGen: 21237441K->1997601K(24417280K)] 38173383K->18933551K(43023872K), 0.8769351 secs] [Times: user=4.45 sys=0.09, real=0.88 secs] 2016-01-25T17:07:13.856+0100: 43854.154: [GC (Allocation Failure) [PSYoungGen: 21797153K->2080272K(24207360K)] 38733103K->19016222K(42813952K), 0.8543820 secs] [Times: user=4.50 sys=0.00, real=0.86 secs] 2016-01-25T17:12:37.866+0100: 44178.164: [GC (Allocation Failure) [PSYoungGen: 21879824K->2170000K(24282112K)] 38815774K->19105950K(42888704K), 0.8236605 secs] [Times: user=4.92 sys=0.00, real=0.82 secs] 2016-01-25T17:18:05.977+0100: 44506.275: [GC (Allocation Failure) [PSYoungGen: 22576272K->2299192K(24404992K)] 39512222K->19235143K(43011584K), 0.7759199 secs] [Times: user=4.77 sys=0.00, real=0.78 secs] 2016-01-25T17:28:11.288+0100: 45111.586: [GC (Allocation Failure) [PSYoungGen: 22705464K->2550874K(24838656K)] 39641415K->19486825K(43445248K), 1.3544459 secs] [Times: user=5.52 sys=1.93, real=1.36 secs] 2016-01-25T17:35:39.350+0100: 45559.648: [GC (Allocation Failure) [PSYoungGen: 23578202K->2665756K(24871424K)] 40514153K->19601715K(43478016K), 1.0373681 secs] [Times: user=5.67 sys=0.04, real=1.04 secs] 2016-01-25T17:43:31.775+0100: 46032.073: [GC (Allocation Failure) [PSYoungGen: 23693084K->2891569K(25351168K)] 40629043K->19827535K(43957760K), 2.8873970 secs] [Times: user=5.28 sys=11.09, real=2.89 secs] 2016-01-25T17:51:22.218+0100: 46502.516: [GC (Allocation Failure) [PSYoungGen: 24390449K->3020322K(24519680K)] 41326415K->19956296K(43126272K), 1.0685985 secs] [Times: user=6.17 sys=0.00, real=1.07 secs] 2016-01-25T18:00:19.997+0100: 47040.295: [GC (Allocation Failure) [PSYoungGen: 24519202K->3177498K(25069568K)] 41455176K->20113480K(43676160K), 1.3560052 secs] [Times: user=6.39 sys=0.34, real=1.36 secs] 2016-01-25T18:09:14.251+0100: 47574.548: [GC (Allocation Failure) [PSYoungGen: 24215578K->3059878K(24098304K)] 41151560K->20219258K(42704896K), 1.5082240 secs] [Times: user=6.62 sys=2.86, real=1.50 secs] 2016-01-25T18:15:01.783+0100: 47922.080: [GC (Allocation Failure) [PSYoungGen: 24097958K->2809652K(25209856K)] 41257338K->20232628K(43816448K), 1.8871482 secs] [Times: user=6.38 sys=6.59, real=1.88 secs] 2016-01-25T18:24:10.926+0100: 48471.223: [GC (Allocation Failure) [PSYoungGen: 23806772K->2757610K(23754752K)] 41229748K->20472641K(42361344K), 1.8807114 secs] [Times: user=6.50 sys=6.05, real=1.88 secs] 2016-01-25T18:32:11.039+0100: 48951.336: [GC (Allocation Failure) [PSYoungGen: 23754730K->2664985K(25179136K)] 41469761K->20610485K(43785728K), 1.6722986 secs] [Times: user=5.99 sys=4.89, real=1.68 secs] 2016-01-25T18:32:33.693+0100: 48973.991: [GC (System.gc()) [PSYoungGen: 5342511K->2261788K(25064448K)] 23288012K->20489350K(43671040K), 1.4310250 secs] [Times: user=5.08 sys=5.69, real=1.43 secs] 2016-01-25T18:32:35.124+0100: 48975.422: [Full GC (System.gc()) [PSYoungGen: 2261788K->373549K(25064448K)] [ParOldGen: 18227562K->18606102K(18606592K)] 20489350K->18979651K(43671040K), [Metaspace: 94654K->94654K(104448K)], 17.7869839 secs] [Times: user=109.81 sys=4.10, real=17.79 secs] 2016-01-25T18:38:01.155+0100: 49301.452: [Full GC (Ergonomics) [PSYoungGen: 21485357K->0K(25064448K)] [ParOldGen: 18606102K->17803897K(22484992K)] 40091459K->17803897K(47549440K), [Metaspace: 94728K->94728K(104448K)], 11.2804123 secs] [Times: user=71.57 sys=1.84, real=11.28 secs] 2016-01-25T18:49:54.421+0100: 50014.719: [GC (Allocation Failure) [PSYoungGen: 21111808K->310417K(24994816K)] 38915705K->18114322K(47479808K), 0.4822526 secs] [Times: user=1.49 sys=0.00, real=0.49 secs] 2016-01-25T19:02:11.033+0100: 50751.331: [GC (Allocation Failure) [PSYoungGen: 21435537K->650681K(21775872K)] 39239442K->18454586K(44260864K), 0.4177362 secs] [Times: user=1.93 sys=0.01, real=0.42 secs] 2016-01-25T19:09:54.731+0100: 51215.028: [GC (Allocation Failure) [PSYoungGen: 21775801K->1405472K(25216000K)] 39579706K->19209377K(47700992K), 0.6537367 secs] [Times: user=3.92 sys=0.00, real=0.65 secs] As always, I'm thankful for any more hints. Please reply publicly so everyone else can benefit from the answers in the future. Thanks Maik [1] http://stackoverflow.com/questions/214362/java-very-large-heap-sizes#6340745 <http://stackoverflow.com/questions/214362/java-very-large-heap-sizes#6340745> [2] https://www.azul.com/products/zing/ <https://www.azul.com/products/zing/> [3] http://www.infoq.com/articles/azul_gc_in_detail <http://www.infoq.com/articles/azul_gc_in_detail> > Am 24.01.2016 um 21:01 schrieb Bogdan Zlatanov <bogdan.zlata...@gmail.com>: > > Hi again, > > Try redirecting all the GC output in it's own file with the -Xloggc: > /path/to/gc/log option. > > Cheers, > Bogdan > > On 24/01/2016 20:19, Maik Musall wrote: >> Hi, >> >> yes, I tried, but the documented log output would never appear anywhere in >> my logs. These are the options I'm using: >> >> -verbose:gc >> -XX:+PrintGCDetails >> -XX:+PrintGCDateStamps >> >> Despite these settings, I can't see ANYTHING in the logs about GC. I don't >> know what I'm doing wrong there? >> >> Maik >> >>> Am 24.01.2016 um 13:27 schrieb Bogdan Zlatanov <bogdan.zlata...@gmail.com> >>> <mailto:bogdan.zlata...@gmail.com>: >>> >>> Hi, >>> >>> Have you tried tracing the garbage collection -> >>> https://docs.oracle.com/cd/E19159-01/819-3681/abeig/index.html >>> <https://docs.oracle.com/cd/E19159-01/819-3681/abeig/index.html>? >>> >>> Cheers, >>> Bogdan >>> >>>> On 24/01/2016 13:01, Musall Maik wrote: >>>> Hi, >>>> >>>> I have an application that often becomes laggy and shows slow responses >>>> almost every day around 5 pm. The instance has daily scheduled restarts at >>>> 5 am, so it occurs after 12 hours of use. When I restart it preemptively >>>> at 3 pm, the problem does not occur. When I don't, users suffer bad >>>> response times on many requests, but not all. >>>> >>>> Users are computing heavy statistics in this application, instance size >>>> (mostly snapshot cache) at that time is usually about 50 GByte, while Xmx >>>> is 84 GByte, and XX:MaxPermSize is 1024m. The biggest instance size I've >>>> seen so far in production is 61 GByte, so 50 seems not like a limit where >>>> something weird happens. >>>> >>>> CPU, memory, filesystem, database and network all show nothing special >>>> before and after 5 pm. I monitored the instance with jvisualvm to no >>>> avail. I've been searching for the reason for quite some time now, so I'd >>>> appreciate any suggestion you might have what else to check. >>>> >>>> Thanks >>>> Maik >>>> >>>> >>>> _______________________________________________ >>>> Do not post admin requests to the list. They will be ignored. >>>> Webobjects-dev mailing list (Webobjects-dev@lists.apple.com >>>> <mailto:Webobjects-dev@lists.apple.com>) >>>> Help/Unsubscribe/Update your Subscription: >>>> https://lists.apple.com/mailman/options/webobjects-dev/bogdan.zlatanov%40gmail.com >>>> >>>> <https://lists.apple.com/mailman/options/webobjects-dev/bogdan.zlatanov%40gmail.com> >>>> >>>> This email sent to bogdan.zlata...@gmail.com >>>> <mailto:bogdan.zlata...@gmail.com> >>> _______________________________________________ >>> Do not post admin requests to the list. They will be ignored. >>> Webobjects-dev mailing list (Webobjects-dev@lists.apple.com >>> <mailto:Webobjects-dev@lists.apple.com>) >>> Help/Unsubscribe/Update your Subscription: >>> https://lists.apple.com/mailman/options/webobjects-dev/maik%40selbstdenker.ag >>> >>> <https://lists.apple.com/mailman/options/webobjects-dev/maik%40selbstdenker.ag> >>> >>> This email sent to m...@selbstdenker.ag <mailto:m...@selbstdenker.ag> >> _______________________________________________ >> Do not post admin requests to the list. They will be ignored. >> Webobjects-dev mailing list (Webobjects-dev@lists.apple.com >> <mailto:Webobjects-dev@lists.apple.com>) >> Help/Unsubscribe/Update your Subscription: >> https://lists.apple.com/mailman/options/webobjects-dev/bogdan.zlatanov%40gmail.com >> >> <https://lists.apple.com/mailman/options/webobjects-dev/bogdan.zlatanov%40gmail.com> >> >> This email sent to bogdan.zlata...@gmail.com >> <mailto:bogdan.zlata...@gmail.com> > _______________________________________________ > Do not post admin requests to the list. They will be ignored. > Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) > Help/Unsubscribe/Update your Subscription: > https://lists.apple.com/mailman/options/webobjects-dev/maik%40selbstdenker.ag > > This email sent to m...@selbstdenker.ag
_______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to arch...@mail-archive.com