In the past I have told people on this list and in the IRC channel #solr
what I use for Java GC settings. A couple of days ago, I cleaned up my
testing methodology to more closely mimic real production queries, and
discovered that my GC settings were woefully inadequate. Here's what I
was using on a virtual machine with 9GB of RAM. I've been using this
for several months, and chose it because I had read several things
praising it. I should have done more research.
-Xms512M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
On my backup servers, I am in the process of getting 3.2.0 ready to
replace our 1.4.1 index. I ran into a situation where committing a
delta-import of only a few thousand records took longer than 3 minutes
(Perl LWP default timeout) on every shard, where normally in production
on 1.4.1 it only takes a few seconds. This was shortly after I had hit
the distributed index pretty hard with my improved benchmarking.
Using jstat, I found that while under benchmarking load, the system was
spending 10-15% of it's time doing garbage collection, and that most of
the garbage collections were from the young generation. First I tried
increasing the young generation size with the -XX:NewSize=1024M
parameter. This helped on the total GC count, but didn't really help
with how much time was spent doing them.
A good command to see these statistics on Linux, and an Oracle link
explaining what it all means:
jstat -gc -t `pgrep java` 5000
http://download.oracle.com/javase/6/docs/technotes/tools/share/jstat.html
I've learned that Solr will keep most of its data in young generation
(eden), unless that memory pool is too small, then it will move data to
the tenured generation. The key for good performance seems to be
creating a large enough young generation. You do need to have a good
chunk of tenured available, unless the solr instance has no index itself
and exists only to distribute queries to shards living on other solr
instances. In that case, it hardly uses the tenured generation. It
turns out that CMSIncrementalMode causes more young generation
collections and makes them take longer, which is exactly what Solr does
NOT need.
After messing around with it for quite a while, I came up with the
following settings, which included an increase in heap size:
-Xms3072M -Xmx3072M -XX:NewSize=1536M -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
With these settings, it spends very little time doing garbage
collections. One of my shards has been up for nearly 24 hours, has been
hit with the benchmarking script repeatedly, and it has only done 62
young generation collections, and zero full collections, with 6.8
seconds total GC time. I am thinking of increasing the NewSize yet
again, because the tenured generation (1.5GB in size) is only one third
utilized after nearly 24 hours.
My settings will probably not work for everyone, but I hope this post
will make it easier for others to find the right solution for themselves.
Thanks,
Shawn