Hi Walter,

you can check if the JVM OOM hook is acknowledged by JVM
and setup in the JVM. The options are "-XX:+PrintFlagsFinal -version"

You can modify your bin/solr script and tweak the function "launch_solr"
at the end of the script. Replace "-jar start.jar" with "-XX:+PrintFlagsFinal 
-version".
Instead of starting solr this will print a huge list of all really
used (and accepted) JVM parameters.
Check what "ccstrlist OnOutOfMemoryError" is telling you.
Is it really pointing to your OOM script?

You can give more MaxGCPauseMillis to give GC more time to cleanup.

The default InitiatingHeapOccupancyPercent is at 45, try it with 75
by setting -XX:InitiatingHeapOccupancyPercent=75



By the way, do you really use UseLargePages in your system
(because the OS must also support this) or is the JVM parameter
just set because some else is also using it?
http://www.oracle.com/technetwork/java/javase/tech/largememory-jsp-137182.html


Regards,
Bernd


Am 21.11.2017 um 02:17 schrieb Walter Underwood:
> When I ran load benchmarks with 6.3.0, an overloaded cluster would get super 
> slow but keep functioning. With 6.5.1, we hit 100% CPU, then start getting 
> OOMs. That is really bad, because it means we need to reboot every node in 
> the cluster.
> 
> Also, the JVM OOM hook isn’t running the process killer (JVM 1.8.0_121-b13). 
> Using the G1 collector with the Shawn Heisey settings in an 8G heap.
> 
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "
> 
> This is not good behavior in prod. The process goes to the bad place, then we 
> need to wait until someone is paged and kills it manually. Luckily, it 
> usually drops out of the live nodes for each collection and doesn’t take user 
> traffic.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
> 

Reply via email to