In the past I have told people on this list and in the IRC channel #solr what I use for Java GC settings. A couple of days ago, I cleaned up my testing methodology to more closely mimic real production queries, and discovered that my GC settings were woefully inadequate. Here's what I was using on a virtual machine with 9GB of RAM. I've been using this for several months, and chose it because I had read several things praising it. I should have done more research.

-Xms512M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

On my backup servers, I am in the process of getting 3.2.0 ready to replace our 1.4.1 index. I ran into a situation where committing a delta-import of only a few thousand records took longer than 3 minutes (Perl LWP default timeout) on every shard, where normally in production on 1.4.1 it only takes a few seconds. This was shortly after I had hit the distributed index pretty hard with my improved benchmarking.

Using jstat, I found that while under benchmarking load, the system was spending 10-15% of it's time doing garbage collection, and that most of the garbage collections were from the young generation. First I tried increasing the young generation size with the -XX:NewSize=1024M parameter. This helped on the total GC count, but didn't really help with how much time was spent doing them.

A good command to see these statistics on Linux, and an Oracle link explaining what it all means:

jstat -gc -t `pgrep java` 5000
http://download.oracle.com/javase/6/docs/technotes/tools/share/jstat.html

I've learned that Solr will keep most of its data in young generation (eden), unless that memory pool is too small, then it will move data to the tenured generation. The key for good performance seems to be creating a large enough young generation. You do need to have a good chunk of tenured available, unless the solr instance has no index itself and exists only to distribute queries to shards living on other solr instances. In that case, it hardly uses the tenured generation. It turns out that CMSIncrementalMode causes more young generation collections and makes them take longer, which is exactly what Solr does NOT need.

After messing around with it for quite a while, I came up with the following settings, which included an increase in heap size:

-Xms3072M -Xmx3072M -XX:NewSize=1536M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled

With these settings, it spends very little time doing garbage collections. One of my shards has been up for nearly 24 hours, has been hit with the benchmarking script repeatedly, and it has only done 62 young generation collections, and zero full collections, with 6.8 seconds total GC time. I am thinking of increasing the NewSize yet again, because the tenured generation (1.5GB in size) is only one third utilized after nearly 24 hours.

My settings will probably not work for everyone, but I hope this post will make it easier for others to find the right solution for themselves.

Thanks,
Shawn

Reply via email to