In addition to the other advice so far, if applicable I'd strongly recommend disabling swap (especially considering that you've already tried varying heap size without the desired effect): https://solr.apache.org/guide/8_8/taking-solr-to-production.html#disabling-swap Some good background reading on virtual memory, swapping, and garbage collection (all likely relevant to your case) can be found here: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
On Mon, May 10, 2021 at 10:27 AM Walter Underwood <[email protected]> wrote: > > That heap is way, too big. Solr does NOT pull the entire index into the JVM > heap. > You need RAM that is not in the heap for the OS to keep the index in file > buffers. > > Try again with: > > -Xms16G > -Xmx16G > > If you run out of that, try 31G. > > Starting and maximum heap size should always be the same for a server process. > The JVM will increase it to max before doing a full GC. > > Also, there is no need for 30 GC threads. Leave that out and use the default. > > Finally, the list strips images, so nobody could see the image. Upload it and > link it, please. > > wunder > Walter Underwood > [email protected] > http://observer.wunderwood.org/ (my blog) > > > On May 9, 2021, at 10:54 PM, Vignan Malyala <[email protected]> wrote: > > > > Hi everyone, > > > > We have 3 cluster solr running in 3 different machines with an index size > > of 300 GB. > > RAM: 300 GB per node > > Heap - Xms: 240GB Xmx: 300GB > > Index size: 300GB > > > > GC_TUNE="-XX:+UseG1GC > > -XX:InitiatingHeapOccupancyPercent=45 > > -XX:ConcGCThreads=6 > > -XX:ParallelGCThreads=30 > > -XX:G1ReservePercent=20 > > > > <autoCommit> > > <maxTime>${solr.autoCommit.maxTime:400000}</maxTime> > > <openSearcher>false</openSearcher> > > </autoCommit> > > > > <autoSoftCommit> > > <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime> > > </autoSoftCommit> > > > > > > > > Our cloud servers suddenly stopped yesterday. When we try to restart, our > > JVM heap size goes to max of 300 GB just in few seconds and we get the > > following message before stopping automatically. > > > > > > > > Heap before GC invocations=0 (full 0): > > garbage-first heap total 251658240K, used 360448K [0x00007eba80000000, > > 0x00007eba8200f000, 0x00007f0580000000) > > region size 32768K, 12 young (393216K), 0 survivors (0K) > > Metaspace used 20504K, capacity 21158K, committed 21248K, reserved > > 22528K > > 2021-05-10T05:31:59.511+0000: 3.036: [GC pause (Metadata GC Threshold) > > (young) (initial-mark) > > Desired survivor size 805306368 bytes, new threshold 15 (max 15) > > > > > > > > > > {Heap before GC invocations=11 (full 0): > > garbage-first heap total 288849920K, used 20398080K [0x00007eba80000000, > > 0x00007eba82011378, 0x00007f0580000000) > > region size 32768K, 440 young (14417920K), 54 survivors (1769472K) > > Metaspace used 58413K, capacity 61495K, committed 61696K, reserved > > 63488K > > 2021-05-10T05:33:15.477+0000: 79.002: [GC pause (G1 Evacuation Pause) > > (young) > > Desired survivor size 922746880 bytes, new threshold 1 (max 15) > > - age 1: 1043976736 bytes, 1043976736 total > > - age 2: 766998080 bytes, 1810974816 total > > , 0.4319767 secs] > > [Parallel Time: 408.3 ms, GC Workers: 30] > > [GC Worker Start (ms): Min: 79002.5, Avg: 79003.0, Max: 79003.6, Diff: > > 1.2] > > [Ext Root Scanning (ms): Min: 0.1, Avg: 0.8, Max: 2.7, Diff: 2.6, Sum: > > 23.7] > > [Update RS (ms): Min: 0.0, Avg: 1.7, Max: 3.1, Diff: 3.1, Sum: 51.7] > > [Processed Buffers: Min: 0, Avg: 3.8, Max: 17, Diff: 17, Sum: 113] > > [Scan RS (ms): Min: 13.9, Avg: 15.8, Max: 16.7, Diff: 2.8, Sum: 474.0] > > [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 2.1, Diff: 2.1, > > Sum: 4.3] > > [Object Copy (ms): Min: 385.5, Avg: 387.5, Max: 390.6, Diff: 5.1, Sum: > > 11624.2] > > [Termination (ms): Min: 0.1, Avg: 0.5, Max: 0.9, Diff: 0.9, Sum: 13.8] > > [Termination Attempts: Min: 1, Avg: 82.1, Max: 172, Diff: 171, Sum: > > 2464] > > [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, Sum: > > 3.6] > > [GC Worker Total (ms): Min: 405.9, Avg: 406.5, Max: 407.3, Diff: 1.4, > > Sum: 12195.3] > > [GC Worker End (ms): Min: 79409.4, Avg: 79409.5, Max: 79409.8, Diff: > > 0.4] > > [Code Root Fixup: 0.1 ms] > > [Code Root Purge: 0.0 ms] > > [Clear CT: 6.7 ms] > > [Other: 16.9 ms] > > [Choose CSet: 0.0 ms] > > [Ref Proc: 5.2 ms] > > [Ref Enq: 0.0 ms] > > [Redirty Cards: 9.2 ms] > > [Humongous Register: 0.3 ms] > > [Humongous Reclaim: 0.0 ms] > > [Free CSet: 0.7 ms] > > > > > > Please help to solve this issue! > > Thanks in advance! > > Regards! > > Vigz > > > > >
