Hi Maik,

I will guess that the answer to the following is No as you said if you restart 
the apps at 3PM the problem does not a happen.

Could it be something other than garbage collection?  I’ve seen a similar 
problem when someone installed backup software that ran during business hours.  
Could it be something unrelated to  your apps that starts at 5PM?  Or are some 
users starting very intensive processing at the end of their day and impacting 
other users?  Could it be on the database side?  A delay there would leave the 
app stalled with very little CPU or IO activity.  That is the only thing that 
occurred to me other than the GC you are investigating.

Chuck


From: 
<[email protected]<mailto:[email protected]>>
 on behalf of Musall Maik <[email protected]<mailto:[email protected]>>
Date: Tuesday, January 26, 2016 at 5:28 AM
To: Apple WebObjects Developer List 
<[email protected]<mailto:[email protected]>>
Subject: Re: Instance becomes unresponsive after 12 hours

Hi all,

a few of you also replied by private mail, thanks. Supplying the -Xloggc option 
worked, although I don't understand where the info went before, and the log 
rotation option doesn't rotate anything but just truncates the log file on 
every JVM restart, which sucks. But ok, I have some data.

Below you find the log of one session. Users reported bad response times that 
correspond with the Full GC runs, but also very bad response times from 17:00 
onward where there is often no GC activity logged at the times that users 
experienced waiting time. So I'm not entirely sure if this is all caused by GC.

Anyway, heap sizes of up to 60 GByte are apparently not so common, at least not 
with WebObjects. I found a helpful article [1] about very large JVM heaps, 
including the hint to the Zing Azul JVM, which features a stopless garbage 
collector [3], but costs $8000 per server and year. In the end, we may actually 
be trying that if other options fail, but first I need to understand more about 
what's in memory. I'm currently approaching that question via JProfiler, 
especially to find out if that's all snapshot cache (how would I limit that in 
size?) or some other homemade memory leak.

Does anyone have experience with the combination of Azul and WO?

I also found one strange event today where the application appeared stuck, but 
continued after a minute or two, while the gc log file was truncated at that 
time, without the JVM restarting. I can't see how that's supposed to happen. 
Here's how that looked in my shell running "tail -F gc.log":


[PSYoungGen: 15256082K->3431947K(16962048K)] 25580475K->13801928K(27895808K), 
1.1505415 secs] [Times: user=6.00 sys=0.03, real=1.15 secs]
2016-01-26T12:11:38.640+0100: 26136.407: [GC (Allocation Failure)
Desired survivor size 4679794688 bytes, new threshold 15 (max 15)
[PSYoungGen: 15791627K->3414349K(16958976K)] 26161608K->13935770K(27892736K), 
1.0761646 secs] [Times: user=6.10 sys=0.17, real=1.08 secs]
2016-01-26T12:16:10.835+0100: 26408.603: [GC (Allocation Failure)
Desired survivor size 4657250304 bytes, new threshold 15 (max 15)
[PSYoungGen: 15774029K->3449938K(17397760K)] 26295450K->14128338K(28331520K), 
1.3285101 secs] [Times: user=6.24 sys=1.39, real=1.33 secs]
2016-01-26T12:16:12.164+0100: 26409.931: [Full GC (Ergonomics) [PSYoungGen: 
3449938K->2355719K(17397760K)] [ParOldGen: 10678400K->10933508K(13649408K)] 
14128338K->13289227K(31047168K), [Metaspace: 84345K->84345K(94208K)], 
12.1170523 secs] [Times: user=67.98 sys=1.03, real=12.11 secs]
2016-01-26T12:21:18.990+0100: 26716.757: [GC (Allocation Failure)
Desired survivor size 4644667392 bytes, new threshold 15 (max 15)
[PSYoungGen: 15218183K->2398851K(17410560K)] 26151691K->13458637K(31059968K), 
1.3389149 secs] [Times: user=8.72 sys=0.17, real=1.34 secs]
tail: gc.log: file truncated
Java HotSpot(TM) 64-Bit Server VM (25.66-b17) for linux-amd64 JRE 
(1.8.0_66-b17), built on Oct  6 2015 17:28:34 by "java_re" with gcc 4.3.0 
20080428 (Red Hat 4.3.0-8)
Memory: 4k page, physical 99008820k(13655516k free), swap 67108860k(66990156k 
free)
CommandLine flags: -XX:InitialHeapSize=1584141120 -XX:+ManagementServer 
-XX:MaxHeapSize=90194313216 -XX:+PrintGC -XX:+PrintGCDateStamps 
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
-XX:-UseGCLogFileRotation -XX:+UseParallelGC
2016-01-26T12:33:36.029+0100: 27453.796: [GC (Allocation Failure)
Desired survivor size 4588568576 bytes, new threshold 15 (max 15)
[PSYoungGen: 15261315K->2538940K(17720832K)] 26321101K->13670347K(31370240K), 
0.7936427 secs] [Times: user=4.62 sys=0.03, real=0.80 secs]
2016-01-26T12:39:29.156+0100: 27806.923: [GC (Allocation Failure)


And here's the above-mentioned full gc log of 4 hours during the most critical 
usage time where users experienced waits. Notable wait times according to 
user's notes occurred at least at the following times: 15:07, 16:05, 16:57, 
17:00-17:35, 17:59.


2016-01-25T15:06:54.344+0100: 36634.642: [GC (Allocation Failure) [PSYoungGen: 
18701268K->2058545K(23079424K)] 33613120K->17079754K(41686016K), 0.8635088 
secs] [Times: user=4.22 sys=0.50, real=0.86 secs]
2016-01-25T15:15:48.438+0100: 37168.736: [GC (Allocation Failure) [PSYoungGen: 
19683633K->2231668K(23157248K)] 34704842K->17370085K(41763840K), 0.9898285 
secs] [Times: user=4.98 sys=0.05, real=0.99 secs]
2016-01-25T15:24:54.891+0100: 37715.189: [GC (Allocation Failure) [PSYoungGen: 
19856756K->2254238K(23623168K)] 34995173K->17626274K(42229760K), 1.0030974 
secs] [Times: user=4.91 sys=0.30, real=1.00 secs]
2016-01-25T15:33:58.784+0100: 38259.082: [GC (Allocation Failure) [PSYoungGen: 
20552606K->2502793K(23740416K)] 35924642K->17914660K(42347008K), 1.0203825 
secs] [Times: user=5.32 sys=0.24, real=1.02 secs]
2016-01-25T15:43:12.959+0100: 38813.256: [GC (Allocation Failure) [PSYoungGen: 
20801161K->2677482K(23608320K)] 36213028K->18138909K(42214912K), 1.2300001 
secs] [Times: user=5.53 sys=0.22, real=1.23 secs]
2016-01-25T15:51:16.686+0100: 39296.983: [GC (Allocation Failure) [PSYoungGen: 
21276394K->2800542K(23760384K)] 36737821K->18411609K(42366976K), 1.1514720 
secs] [Times: user=6.19 sys=0.15, real=1.15 secs]
2016-01-25T15:58:35.696+0100: 39735.994: [GC (Allocation Failure) [PSYoungGen: 
21399454K->2938402K(23849472K)] 37010521K->18700024K(42456064K), 2.8563665 
secs] [Times: user=6.23 sys=8.99, real=2.86 secs]
2016-01-25T16:05:33.310+0100: 40153.607: [GC (System.gc()) [PSYoungGen: 
20526288K->2986997K(22197760K)] 36287910K->18889778K(40804352K), 1.2626979 
secs] [Times: user=6.22 sys=1.00, real=1.27 secs]
2016-01-25T16:05:34.572+0100: 40154.870: [Full GC (System.gc()) [PSYoungGen: 
2986997K->0K(22197760K)] [ParOldGen: 15902781K->16935926K(18606592K)] 
18889778K->16935926K(40804352K), [Metaspace: 90226K->90226K(100352K)], 
15.0685505 secs] [Times: user=86.67 sys=7.55, real=15.06 secs]
2016-01-25T16:13:35.766+0100: 40636.064: [GC (Allocation Failure) [PSYoungGen: 
19210752K->419289K(23559168K)] 36146678K->17355224K(42165760K), 0.5276242 secs] 
[Times: user=2.41 sys=0.00, real=0.53 secs]
2016-01-25T16:22:46.735+0100: 41187.033: [GC (Allocation Failure) [PSYoungGen: 
19353561K->740457K(23835648K)] 36289496K->17676391K(42442240K), 0.6634034 secs] 
[Times: user=2.52 sys=0.00, real=0.66 secs]
2016-01-25T16:30:31.503+0100: 41651.801: [GC (Allocation Failure) [PSYoungGen: 
19674729K->1023572K(24313856K)] 36610663K->17959507K(42920448K), 0.8099960 
secs] [Times: user=2.72 sys=0.08, real=0.81 secs]
2016-01-25T16:37:51.095+0100: 42091.393: [GC (Allocation Failure) [PSYoungGen: 
20311636K->1263580K(20551680K)] 37247571K->18199515K(39158272K), 0.7992434 
secs] [Times: user=3.32 sys=0.01, real=0.80 secs]
2016-01-25T16:49:34.671+0100: 42794.969: [GC (Allocation Failure) [PSYoungGen: 
20551644K->1565656K(24408576K)] 37487579K->18501598K(43015168K), 0.8773248 
secs] [Times: user=3.74 sys=0.00, real=0.87 secs]
2016-01-25T16:56:34.541+0100: 43214.839: [GC (Allocation Failure) [PSYoungGen: 
21011416K->1791681K(21237760K)] 37947358K->18727623K(39844352K), 0.7816826 
secs] [Times: user=4.03 sys=0.00, real=0.79 secs]
2016-01-25T17:01:31.494+0100: 43511.792: [GC (Allocation Failure) [PSYoungGen: 
21237441K->1997601K(24417280K)] 38173383K->18933551K(43023872K), 0.8769351 
secs] [Times: user=4.45 sys=0.09, real=0.88 secs]
2016-01-25T17:07:13.856+0100: 43854.154: [GC (Allocation Failure) [PSYoungGen: 
21797153K->2080272K(24207360K)] 38733103K->19016222K(42813952K), 0.8543820 
secs] [Times: user=4.50 sys=0.00, real=0.86 secs]
2016-01-25T17:12:37.866+0100: 44178.164: [GC (Allocation Failure) [PSYoungGen: 
21879824K->2170000K(24282112K)] 38815774K->19105950K(42888704K), 0.8236605 
secs] [Times: user=4.92 sys=0.00, real=0.82 secs]
2016-01-25T17:18:05.977+0100: 44506.275: [GC (Allocation Failure) [PSYoungGen: 
22576272K->2299192K(24404992K)] 39512222K->19235143K(43011584K), 0.7759199 
secs] [Times: user=4.77 sys=0.00, real=0.78 secs]
2016-01-25T17:28:11.288+0100: 45111.586: [GC (Allocation Failure) [PSYoungGen: 
22705464K->2550874K(24838656K)] 39641415K->19486825K(43445248K), 1.3544459 
secs] [Times: user=5.52 sys=1.93, real=1.36 secs]
2016-01-25T17:35:39.350+0100: 45559.648: [GC (Allocation Failure) [PSYoungGen: 
23578202K->2665756K(24871424K)] 40514153K->19601715K(43478016K), 1.0373681 
secs] [Times: user=5.67 sys=0.04, real=1.04 secs]
2016-01-25T17:43:31.775+0100: 46032.073: [GC (Allocation Failure) [PSYoungGen: 
23693084K->2891569K(25351168K)] 40629043K->19827535K(43957760K), 2.8873970 
secs] [Times: user=5.28 sys=11.09, real=2.89 secs]
2016-01-25T17:51:22.218+0100: 46502.516: [GC (Allocation Failure) [PSYoungGen: 
24390449K->3020322K(24519680K)] 41326415K->19956296K(43126272K), 1.0685985 
secs] [Times: user=6.17 sys=0.00, real=1.07 secs]
2016-01-25T18:00:19.997+0100: 47040.295: [GC (Allocation Failure) [PSYoungGen: 
24519202K->3177498K(25069568K)] 41455176K->20113480K(43676160K), 1.3560052 
secs] [Times: user=6.39 sys=0.34, real=1.36 secs]
2016-01-25T18:09:14.251+0100: 47574.548: [GC (Allocation Failure) [PSYoungGen: 
24215578K->3059878K(24098304K)] 41151560K->20219258K(42704896K), 1.5082240 
secs] [Times: user=6.62 sys=2.86, real=1.50 secs]
2016-01-25T18:15:01.783+0100: 47922.080: [GC (Allocation Failure) [PSYoungGen: 
24097958K->2809652K(25209856K)] 41257338K->20232628K(43816448K), 1.8871482 
secs] [Times: user=6.38 sys=6.59, real=1.88 secs]
2016-01-25T18:24:10.926+0100: 48471.223: [GC (Allocation Failure) [PSYoungGen: 
23806772K->2757610K(23754752K)] 41229748K->20472641K(42361344K), 1.8807114 
secs] [Times: user=6.50 sys=6.05, real=1.88 secs]
2016-01-25T18:32:11.039+0100: 48951.336: [GC (Allocation Failure) [PSYoungGen: 
23754730K->2664985K(25179136K)] 41469761K->20610485K(43785728K), 1.6722986 
secs] [Times: user=5.99 sys=4.89, real=1.68 secs]
2016-01-25T18:32:33.693+0100: 48973.991: [GC (System.gc()) [PSYoungGen: 
5342511K->2261788K(25064448K)] 23288012K->20489350K(43671040K), 1.4310250 secs] 
[Times: user=5.08 sys=5.69, real=1.43 secs]
2016-01-25T18:32:35.124+0100: 48975.422: [Full GC (System.gc()) [PSYoungGen: 
2261788K->373549K(25064448K)] [ParOldGen: 18227562K->18606102K(18606592K)] 
20489350K->18979651K(43671040K), [Metaspace: 94654K->94654K(104448K)], 
17.7869839 secs] [Times: user=109.81 sys=4.10, real=17.79 secs]
2016-01-25T18:38:01.155+0100: 49301.452: [Full GC (Ergonomics) [PSYoungGen: 
21485357K->0K(25064448K)] [ParOldGen: 18606102K->17803897K(22484992K)] 
40091459K->17803897K(47549440K), [Metaspace: 94728K->94728K(104448K)], 
11.2804123 secs] [Times: user=71.57 sys=1.84, real=11.28 secs]
2016-01-25T18:49:54.421+0100: 50014.719: [GC (Allocation Failure) [PSYoungGen: 
21111808K->310417K(24994816K)] 38915705K->18114322K(47479808K), 0.4822526 secs] 
[Times: user=1.49 sys=0.00, real=0.49 secs]
2016-01-25T19:02:11.033+0100: 50751.331: [GC (Allocation Failure) [PSYoungGen: 
21435537K->650681K(21775872K)] 39239442K->18454586K(44260864K), 0.4177362 secs] 
[Times: user=1.93 sys=0.01, real=0.42 secs]
2016-01-25T19:09:54.731+0100: 51215.028: [GC (Allocation Failure) [PSYoungGen: 
21775801K->1405472K(25216000K)] 39579706K->19209377K(47700992K), 0.6537367 
secs] [Times: user=3.92 sys=0.00, real=0.65 secs]


As always, I'm thankful for any more hints. Please reply publicly so everyone 
else can benefit from the answers in the future.


Thanks
Maik


[1] http://stackoverflow.com/questions/214362/java-very-large-heap-sizes#6340745
[2] https://www.azul.com/products/zing/
[3] http://www.infoq.com/articles/azul_gc_in_detail


Am 24.01.2016 um 21:01 schrieb Bogdan Zlatanov 
<[email protected]<mailto:[email protected]>>:

Hi again,

Try redirecting all the GC output in it's own file with the -Xloggc: 
/path/to/gc/log option.

Cheers,
Bogdan

On 24/01/2016 20:19, Maik Musall wrote:

Hi,

yes, I tried, but the documented log output would never appear anywhere in my 
logs. These are the options I'm using:

-verbose:gc
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps

Despite these settings, I can't see ANYTHING in the logs about GC. I don't know 
what I'm doing wrong there?

Maik



Am 24.01.2016 um 13:27 schrieb Bogdan Zlatanov 
<[email protected]><mailto:[email protected]>:

Hi,

Have you tried tracing the garbage collection -> 
https://docs.oracle.com/cd/E19159-01/819-3681/abeig/index.html?

Cheers,
Bogdan



On 24/01/2016 13:01, Musall Maik wrote:
Hi,

I have an application that often becomes laggy and shows slow responses almost 
every day around 5 pm. The instance has daily scheduled restarts at 5 am, so it 
occurs after 12 hours of use. When I restart it preemptively at 3 pm, the 
problem does not occur. When I don't, users suffer bad response times on many 
requests, but not all.

Users are computing heavy statistics in this application, instance size (mostly 
snapshot cache) at that time is usually about 50 GByte, while Xmx is 84 GByte, 
and XX:MaxPermSize is 1024m. The biggest instance size I've seen so far in 
production is 61 GByte, so 50 seems not like a limit where something weird 
happens.

CPU, memory, filesystem, database and network all show nothing special before 
and after 5 pm. I monitored the instance with jvisualvm to no avail. I've been 
searching for the reason for quite some time now, so I'd appreciate any 
suggestion you might have what else to check.

Thanks
Maik


_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      
([email protected]<mailto:[email protected]>)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/bogdan.zlatanov%40gmail.com

This email sent to [email protected]<mailto:[email protected]>

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      
([email protected]<mailto:[email protected]>)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/maik%40selbstdenker.ag

This email sent to [email protected]<mailto:[email protected]>

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      
([email protected]<mailto:[email protected]>)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/bogdan.zlatanov%40gmail.com

This email sent to [email protected]<mailto:[email protected]>

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      
([email protected]<mailto:[email protected]>)
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/maik%40selbstdenker.ag

This email sent to [email protected]<mailto:[email protected]>

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [email protected]

Reply via email to