Hi Shawn, I can try to help you with the test. I have a 6 solr node cluster ( machines with 4 cores and 28GB RAM, 250 GB hard disk ) running on OpenJDK 11.0.11) having 2 shards and 3 replica's each.
Currently, the cluster has 27GB of data per core, I can ingest more data to make it around 100GB per core. The nodes have 20GB heap as of now, will change it to 4 GB for the test. Here is the current GC settings from my cluster, please let me know if we need to change anything before the test part from heap size? -XX:+AggressiveOpts-XX:+HeapDumpOnOutOfMemoryError -XX:+ParallelRefProcEnabled-XX:+PerfDisableSharedMem-XX:+UseG1GC -XX:+UseLargePages-XX:-OmitStackTraceInFastThrow-XX:ConcGCThreads=4 -XX:G1ReservePercent=18-XX:HeapDumpPath=/app/solrdata8/logs/heapdump -XX:InitiatingHeapOccupancyPercent=50-XX:MaxGCPauseMillis=250 -XX:MaxNewSize=4G-XX:OnOutOfMemoryError=/app/solr8/bin/oom_solr.sh 8983 /app/solrdata8/logs-XX:ParallelGCThreads=8 -Xlog:gc*:file=/app/solrdata8/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M -Xms20g-Xmx20g-Xss256k On Tue, Oct 12, 2021 at 2:24 AM Shawn Heisey <[email protected]> wrote: > I would like to request help from the community on something. I'm not > in a position to do the kind of testing that I want, as I no longer have > access to Solr servers with large amounts of data. > > What I want to test is the Sheandoah garbage collector. I've done some > testing on my own, but the index is very small (629MB) and so is the > heap size (512MB). > > Here is a GC log from my most recent test: > > https://www.dropbox.com/s/8cbncuax7kv0x9c/solr_gc.log?dl=0 > > For this test, I deleted all the GC logs, restarted Solr, deleted all > docs and optimized the index so it had 0 segments, and then asked > dovecot (POP/IMAP server) to do a full reindex. At this moment there > are 158905 docs in the index. Then I grabbed the GC log linked above > and had the gceasy.io website analyze it. The GC performance looks very > good ... but with the heap at only 512MB, even a bad GC config would > probably look good. Here are the GC settings that I put in > /etc/default/solr.in.sh: > > GC_TUNE=" \ > -XX:+AlwaysPreTouch \ > -XX:+UseNUMA \ > -XX:+UseShenandoahGC \ > -XX:+ParallelRefProcEnabled \ > -XX:+UseStringDeduplication \ > -XX:ParallelGCThreads=2 \ > " > > I'm running this on a t3a.medium EC2 instance, which only has 2 CPUs, so > I limited the GC threads to 2. This instance is my personal mail > server. If anyone brave enough to help me test wants to try it, and you > have a server with a LOT of cores, you could increase the number of > threads. > > What I need to see is the GC logs that Solr creates, along with some > details about the indexes on the server that generated the log. Best > results will come from very busy servers that have a large index ... > hoping for 100GB or more of index per Solr core, and a max heap size at > least 4GB. If you want to get really adventurous, you could gather GC > logs with the default GC settings (which in later Solr versions is G1GC) > and with Shenandoah. > > A recent version of Java 11 is required to enable the Shenandoah > collector. I think it was made available in 11.0.3. I am running > OpenJDK 11.0.11, the latest available on Ubuntu 20.04 LTS. > > I'm not advocating that anyone try this on a mission-critical production > system, but I would not expect it to cause problems on such a setup. > Use your own judgement. > > Thanks, > Shawn > > -- Best Regards, Dinesh Naik
