Hi Vigz,

If you have a 300GB of RAM machine, you’d want a lower heap size, to leave room 
for OS to cache files. Also, the real memory usage of the JVM would likely go 
beyond your RAM, in which case either the JVM memory will go to swap (which 
would kill performance) or the OS will kill the process if swap is disabled 
(sounds to be your case here, you can double-check in dmesg).

I also doubt you need 300GB of heap for a 300GB index. In most use-cases, 30GB 
of heap would be plenty.

Here’s what you can try first:
- lower heap size. Try with half (150GB) and move from there. You’ll want to 
monitor your actual heap usage (we have a tool that does that, the link is in 
my signature) and adjust further. I would be surprised if you need more than 
30GB
- if you have a recent Java version (I would recommend Java 11), the defaults 
for G1GC should be pretty sensible. So you may revert to Solr’s defaults, maybe 
remove the ConcGCThreads and ParallelGCThreads and rely on the Java defaults if 
you have a big box (like you seem to have).

Lastly, you may want to hard commit more often (autoCommit -> set maxSize to 
100m or so and move from there). You may also want to autoSoftCommit, instead 
of committing from the application.

Best regards,
Radu
--
Sematext Cloud - Full Stack Observability - https://sematext.com
Solr and Elasticsearch Consulting, Training and Production Support

> On 10 May 2021, at 08:54, Vignan Malyala <[email protected]> wrote:
> 
> Hi everyone,
> 
> We have 3 cluster solr running in 3 different machines with an index size of 
> 300 GB.
> RAM: 300 GB per node
> Heap - Xms: 240GB Xmx: 300GB
> Index size: 300GB
> 
> GC_TUNE="-XX:+UseG1GC
> -XX:InitiatingHeapOccupancyPercent=45
> -XX:ConcGCThreads=6
> -XX:ParallelGCThreads=30
> -XX:G1ReservePercent=20
> 
> <autoCommit>
>      <maxTime>${solr.autoCommit.maxTime:400000}</maxTime>
>      <openSearcher>false</openSearcher>
>    </autoCommit>
> 
>  <autoSoftCommit>
>      <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
>    </autoSoftCommit>
> 
> 
> 
> Our cloud servers suddenly stopped yesterday. When we try to restart, our JVM 
> heap size goes to max of 300 GB just in few seconds and we get the following 
> message before stopping automatically.
> 
> 
> 
> Heap before GC invocations=0 (full 0):
> garbage-first heap   total 251658240K, used 360448K [0x00007eba80000000, 
> 0x00007eba8200f000, 0x00007f0580000000)
>  region size 32768K, 12 young (393216K), 0 survivors (0K)
> Metaspace       used 20504K, capacity 21158K, committed 21248K, reserved 
> 22528K
> 2021-05-10T05:31:59.511+0000: 3.036: [GC pause (Metadata GC Threshold) 
> (young) (initial-mark)
> Desired survivor size 805306368 bytes, new threshold 15 (max 15)
> 
> 
> 
> 
> {Heap before GC invocations=11 (full 0):
> garbage-first heap   total 288849920K, used 20398080K [0x00007eba80000000, 
> 0x00007eba82011378, 0x00007f0580000000)
>  region size 32768K, 440 young (14417920K), 54 survivors (1769472K)
> Metaspace       used 58413K, capacity 61495K, committed 61696K, reserved 
> 63488K
> 2021-05-10T05:33:15.477+0000: 79.002: [GC pause (G1 Evacuation Pause) (young)
> Desired survivor size 922746880 bytes, new threshold 1 (max 15)
> - age   1: 1043976736 bytes, 1043976736 total
> - age   2:  766998080 bytes, 1810974816 total
> , 0.4319767 secs]
>   [Parallel Time: 408.3 ms, GC Workers: 30]
>      [GC Worker Start (ms): Min: 79002.5, Avg: 79003.0, Max: 79003.6, Diff: 
> 1.2]
>      [Ext Root Scanning (ms): Min: 0.1, Avg: 0.8, Max: 2.7, Diff: 2.6, Sum: 
> 23.7]
>      [Update RS (ms): Min: 0.0, Avg: 1.7, Max: 3.1, Diff: 3.1, Sum: 51.7]
>         [Processed Buffers: Min: 0, Avg: 3.8, Max: 17, Diff: 17, Sum: 113]
>      [Scan RS (ms): Min: 13.9, Avg: 15.8, Max: 16.7, Diff: 2.8, Sum: 474.0]
>      [Code Root Scanning (ms): Min: 0.0, Avg: 0.1, Max: 2.1, Diff: 2.1, Sum: 
> 4.3]
>      [Object Copy (ms): Min: 385.5, Avg: 387.5, Max: 390.6, Diff: 5.1, Sum: 
> 11624.2]
>      [Termination (ms): Min: 0.1, Avg: 0.5, Max: 0.9, Diff: 0.9, Sum: 13.8]
>         [Termination Attempts: Min: 1, Avg: 82.1, Max: 172, Diff: 171, Sum: 
> 2464]
>      [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.4, Diff: 0.4, Sum: 3.6]
>      [GC Worker Total (ms): Min: 405.9, Avg: 406.5, Max: 407.3, Diff: 1.4, 
> Sum: 12195.3]
>      [GC Worker End (ms): Min: 79409.4, Avg: 79409.5, Max: 79409.8, Diff: 0.4]
>   [Code Root Fixup: 0.1 ms]
>   [Code Root Purge: 0.0 ms]
>   [Clear CT: 6.7 ms]
>   [Other: 16.9 ms]
>      [Choose CSet: 0.0 ms]
>      [Ref Proc: 5.2 ms]
>      [Ref Enq: 0.0 ms]
>      [Redirty Cards: 9.2 ms]
>      [Humongous Register: 0.3 ms]
>      [Humongous Reclaim: 0.0 ms]
>      [Free CSet: 0.7 ms]
> 
> 
> Please help to solve this issue!
> Thanks in advance!
> Regards!
> Vigz
> 
> 

Reply via email to