Hi Radu, Thanks for replying. As suggested by you, we have changed to Java 11, reverted to solr defaults and removed ConcGCThreads and ParallelGCThreads. But the issue persists the same. We tried changing heaps multiple times with 30GB, 150GB, 250 GB, etc. It doesn't have effect. Solr still stops with in 5 mins after restarting as the JVM heap goes full. I see its not able to recover and JVM is getting occupied in all servers.
I'm not getting the issue with solr here. Any suggestions? Regards! On Mon, May 10, 2021 at 12:33 PM Radu Gheorghe <[email protected]> wrote: > Hi Vigz, > > If you have a 300GB of RAM machine, you’d want a lower heap size, to leave > room for OS to cache files. Also, the real memory usage of the JVM would > likely go beyond your RAM, in which case either the JVM memory will go to > swap (which would kill performance) or the OS will kill the process if swap > is disabled (sounds to be your case here, you can double-check in dmesg). > > I also doubt you need 300GB of heap for a 300GB index. In most use-cases, > 30GB of heap would be plenty. > > Here’s what you can try first: > - lower heap size. Try with half (150GB) and move from there. You’ll want > to monitor your actual heap usage (we have a tool that does that, the link > is in my signature) and adjust further. I would be surprised if you need > more than 30GB > - if you have a recent Java version (I would recommend Java 11), the > defaults for G1GC should be pretty sensible. So you may revert to Solr’s > defaults, maybe remove the ConcGCThreads and ParallelGCThreads and rely on > the Java defaults if you have a big box (like you seem to have). > > Lastly, you may want to hard commit more often (autoCommit -> set maxSize > to 100m or so and move from there). You may also want to autoSoftCommit, > instead of committing from the application. > > Best regards, > Radu > -- > Sematext Cloud - Full Stack Observability - https://sematext.com > Solr and Elasticsearch Consulting, Training and Production Support > > > On 10 May 2021, at 08:54, Vignan Malyala <[email protected]> wrote: > > > > Hi everyone, > > > > We have 3 cluster solr running in 3 different machines with an index > size of 300 GB. > > RAM: 300 GB per node > > Heap - Xms: 240GB Xmx: 300GB > > Index size: 300GB > > > > GC_TUNE="-XX:+UseG1GC > > -XX:InitiatingHeapOccupancyPercent=45 > > -XX:ConcGCThreads=6 > > -XX:ParallelGCThreads=30 > > -XX:G1ReservePercent=20 > > > > <autoCommit> > > <maxTime>${solr.autoCommit.maxTime:400000}</maxTime> > > <openSearcher>false</openSearcher> > > </autoCommit> > > > > <autoSoftCommit> > > <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> > > </autoSoftCommit> > > > > > > > > Our cloud servers suddenly stopped yesterday. When we try to restart, > our JVM heap size goes to max of 300 GB just in few seconds and we get the > following message before stopping automatically. > > > > > > > > Heap before GC invocations=0 (full 0): > > garbage-first heap total 251658240K, used 360448K [0x00007eba80000000, > 0x00007eba8200f000, 0x00007f0580000000) > > region size 32768K, 12 young (393216K), 0 survivors (0K) > > Metaspace used 20504K, capacity 21158K, committed 21248K, reserved > 22528K > > 2021-05-10T05:31:59.511+0000: 3.036: [GC pause (Metadata GC Threshold) > (young) (initial-mark) > > Desired survivor size 805306368 bytes, new threshold 15 (max 15) > > > > > > > > > > {Heap before GC invocations=11 (full 0): > > garbage-first heap total 288849920K, used 20398080K > [0x00007eba80000000, 0x00007eba82011378, 0x00007f0580000000) > > region size 32768K, 440 young (14417920K), 54 survivors (1769472K) > > Metaspace used 58413K, capacity 61495K, committed 61696K, reserved > 63488K > > 2021-05-10T05:33:15.477+0000: 79.002: [GC pause (G1 Evacuation Pause) > (young) > > Desired survivor size 922746880 bytes, new threshold 1 (max 15) > > - age 1: 1043976736 bytes, 1043976736 total > > - age 2: 766998080 bytes, 1810974816 total > > , 0.4319767 secs] > > [Parallel Time: 408.3 ms, GC Workers: 30] > > [GC Worker Start (ms): Min: 79002.5, Avg: 79003.0, Max: 79003.6, > Diff: 1.2] > > [Ext Root Scanning (ms): Min: 0.1, Avg: 0.8, Max: 2.7, Diff: 2.6, > Sum: 23.7] > > [Update RS (ms): Min: 0.0, Avg: 1.7, Max: 3.1, Diff: 3.1, Sum: 51.7] > > [Processed Buffers: Min: 0, Avg: 3.8, Max: 17, Diff: 17, Sum: > 113] > > [Scan RS (ms): Min: 13.9, Avg: 15.8, Max: 16.7, Diff: 2.8, Sum: > 474.0] > > [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 2.1, Diff: 2.1, > Sum: 4.3] > > [Object Copy (ms): Min: 385.5, Avg: 387.5, Max: 390.6, Diff: 5.1, > Sum: 11624.2] > > [Termination (ms): Min: 0.1, Avg: 0.5, Max: 0.9, Diff: 0.9, Sum: > 13.8] > > [Termination Attempts: Min: 1, Avg: 82.1, Max: 172, Diff: 171, > Sum: 2464] > > [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, > Sum: 3.6] > > [GC Worker Total (ms): Min: 405.9, Avg: 406.5, Max: 407.3, Diff: > 1.4, Sum: 12195.3] > > [GC Worker End (ms): Min: 79409.4, Avg: 79409.5, Max: 79409.8, > Diff: 0.4] > > [Code Root Fixup: 0.1 ms] > > [Code Root Purge: 0.0 ms] > > [Clear CT: 6.7 ms] > > [Other: 16.9 ms] > > [Choose CSet: 0.0 ms] > > [Ref Proc: 5.2 ms] > > [Ref Enq: 0.0 ms] > > [Redirty Cards: 9.2 ms] > > [Humongous Register: 0.3 ms] > > [Humongous Reclaim: 0.0 ms] > > [Free CSet: 0.7 ms] > > > > > > Please help to solve this issue! > > Thanks in advance! > > Regards! > > Vigz > > > > > >
