I am currently using the Adobe Experience Manager (AEM also known as CQ)
for a Client's site (Java platform). It uses OpenJDK:
java version "1.7.0_65" OpenJDK Runtime Environment
(rhel-2.5.1.2.el6_5-x86_64 u65-b17) OpenJDK 64-Bit Server VM (build
24.65-b04, mixed mode)
It is running on Rackspace with the following:
vCPU: 4 Memory: 16GB Guest OS: Red Hat Enterprise Linux 6 (64-bit)
Since it has been in production I have been experiencing very slow
performance on the part of the application. It goes like this I launch the
app, everything is smooth then 7 to 10 days later the CPU usage spikes to
400% (~4000 users/day hit the site). The site becomes exceptionally slow
and never becomes an OOM exception.
Since I am a novice at Java Memory management I started reading about how
it works and found tools like jstat and jmap. When the system was
overwhelmed the second time around, I got a heap dump and dug into it.
It all seems to be pointing out at org.slf4j.helpers.BasicMarkerFactory and
org.slf4j.helpers.BasicMarker as when I analyze it with MAT eclipse I see
that the biggest retained object by retained size is:
org.slf4j.helpers.BasicMarkerFactory @ 0x6021a4f00Shallow Size: 16 B
Retained Size: 6.8 GB
and
When I run a *Leak suspects report* I get the following result:
DescriptionOne instance of "org.slf4j.helpers.BasicMarkerFactory"
loaded by "org.apache.felix.framework.BundleWiringImpl$BundleClassLoaderJava5
@ 0x60219a878" occupies 7,263,024,848 (96.71%) bytes. The memory is
accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "<system
class loader>".
Keywords
java.util.concurrent.ConcurrentHashMap$Segment[]
org.apache.felix.framework.BundleWiringImpl$BundleClassLoaderJava5 @ 0x60219a878
org.slf4j.helpers.BasicMarkerFactory
and
Shortest Paths To the Accumulation Point
Class Name
Shallow Heap Retained Heap
java.util.concurrent.ConcurrentHashMap$Segment[16] @ 0x6021a4f40
80 7,263,024,784.... segments
java.util.concurrent.ConcurrentHashMap @ 0x6021a4f10
48 7,263,024,832
.
... markerMap org.slf4j.helpers.BasicMarkerFactory @ 0x6021a4f00
16 7,263,024,848
.
... markerFactory org.slf4j.impl.StaticMarkerBinder @
0x6021d3970 16 16
.
... SINGLETON class org.slf4j.impl.StaticMarkerBinder @
0x6021d38f8 8 24
.
... [328] java.lang.Object[640] @ 0x6021d2ee8
2,576 9,592
.
... elementData java.util.Vector @ 0x6021d0fe0
32 9,624
.
... classes org.apache.felix.framework.BundleWiringImpl$
BundleClassLoaderJava5 @ 0x6021c32e0
96 26,888
.
... <classloader> class
ch.qos.logback.classic.Logger @
0x600be4310
16 16
. .
. ...<class>
ch.qos.logback.classic.Logger @ 0x600282a78 48
48
.
... <Java Local> java.lang.Thread @ 0x60077b450
pool-9-thread-1 Thread
104 3,344
.
... <class>
ch.qos.logback.classic.Logger @ 0x60025b850 48
48
.
... <class>
ch.qos.logback.classic.Logger @ 0x604b0a708 48
48
.
... <class>
ch.qos.logback.classic.Logger @ 0x604b0a6d8 48
48
.
... <class>
ch.qos.logback.classic.Logger @ 0x6049debe0 48
48
.
... <class>
ch.qos.logback.classic.Logger @ 0x604535228 48
48
.
... <class>
ch.qos.logback.classic.Logger @ 0x604124248 48
48
Also when I run:
$ sudo -u aem jmap -histo PID
num #instances #bytes class
name----------------------------------------------
1: 11460084 950827248 [C
2: 10740160 257763840 java.lang.String
3: 7681495 245807840
java.util.concurrent.ConcurrentHashMap$HashEntry
4: 7563527 181524648 org.slf4j.helpers.BasicMarker
5: 217007 173568376 [I
6: 177602 158721184 [B
7: 60611 69739136
[Ljava.util.concurrent.ConcurrentHashMap$HashEntry;
8: 1147481 69348496 [Ljava.lang.Object;
9: 1797107 43130568
org.apache.jackrabbit.oak.plugins.segment.RecordId
10: 208912 33824544 <constMethodKlass>
11: 570143 31928008 org.mozilla.javascript.ast.Name
12: 22350 27643920 <constantPoolKlass>
13: 208912 26752544 <methodKlass>
14: 821217 26278944 java.util.UUID
15: 793800 25401600 java.util.HashMap$Entry
16: 532946 21317840 org.mozilla.javascript.Node
17: 792296 19015104 java.lang.Long
18: 191294 18335600 [Ljava.util.HashMap$Entry;
19: 22350 16133328 <instanceKlassKlass>
20: 173883 15855152 [Ljava.lang.String;
21: 635690 15256560
org.apache.sling.engine.impl.request.SlingRequestProgressTracker$TrackingEntry
22: 18509 14662848 <constantPoolCacheKlass>
23: 911112 14577792 java.lang.Integer
24: 255426 14303856
org.apache.jackrabbit.oak.plugins.segment.SegmentNodeBuilder
25: 519324 12463776 java.util.ArrayList
26: 254643 12222864
org.apache.jackrabbit.oak.core.SecureNodeBuilder
27: 137703 11016240 java.lang.reflect.Method
28: 312116 9987712
org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState
29: 19236 9828448
[Lorg.apache.jackrabbit.oak.plugins.segment.SegmentId;
30: 242179 9687160 java.util.TreeMap$Entry
31: 197121 9461808 java.util.HashMap
32: 15041 9416328 <methodDataKlass>
33: 387927 9310248
org.apache.jackrabbit.oak.plugins.segment.MapRecord
34: 250049 8001568
org.apache.jackrabbit.oak.plugins.memory.MemoryNodeBuilder$UnconnectedHead
35: 248586 7954752 org.apache.jackrabbit.oak.core.MutableTree
36: 107865 7948112 [S
37: 191950 7678000 java.util.LinkedHashMap$Entry
38: 102212 6541568 org.mozilla.javascript.ast.PropertyGet
39: 37021 6515696 org.mozilla.javascript.ast.FunctionNode
40: 161905 6476200 org.mozilla.javascript.ScriptableObject$Slot
.....
8210: 1 16 org.slf4j.helpers.BasicMarkerFactory
I noticed:
4: 7563527 181524648 org.slf4j.helpers.BasicMarker
and
8210: 1 16 org.slf4j.helpers.BasicMarkerFactory
When I go into the documentation of org.slf4j.helpers.BasicMarkerFacotry I
see the following that gets my attention:
detachMarkerpublic boolean detachMarker(String name)Description copied
from interface: IMarkerFactoryDetach an existing marker.Note that
after a marker is detached, there might still be "dangling" references
to the detached marker.
Specified by:
detachMarker in interface IMarkerFactoryParameters:
name - The name of the marker to detachReturns:
whether the marker could be detached or not
In particular:
Note that after a marker is detached, there might still be "dangling"
references to the detached marker.
1. Hopefully someone will be able to help pinpoint the cause of my
issues as I am a little lost here ?
2. Has anyone seen this before ?
3. How could I go about troubleshooting this issue furthermore ?
4. Do you agree that org.slf4j.helpers.BasicMarker and
org.slf4j.helpers.BasicMarkerFactory seem to be the root cause of my issues
?
5. Is my logging configuration a suspect ?
6. Is this a slow memory leak or a performance tuning issue (in my
opinion seem like the memory leaks slowly over a week or so) ?
7. Perhaps "org.apache.felix.framework" is failing to free resources ?
*Any advice is welcomed.*
Thanks in advance.