On 7/22/2013 6:45 AM, Markus Jelsma wrote: > You should increase your ZK time out, this may be the issue in your case. You > may also want to try the G1GC collector to keep STW under ZK time out.
When I tried G1, the occasional stop-the-world GC actually got worse. I tried G1 after trying CMS with no other tuning parameters. The average GC time went down, but when it got into a place where it had to do a stop-the-world collection, it was worse. Based on the GC statistics in jvisualvm and jstat, I didn't think I had a problem. The way I discovered that I had a problem was by looking at my haproxy load balancer -- sometimes requests would be sent to a backup server instead of my primary, because the ping request handler was timing out on the LB health check. The LB was set to time out after five seconds. When I went looking deeper with the GC log and some other tools, I was seeing 8-10 second GC pauses. G1 was showing me pauses of 12 seconds. Now I use a heavily tuned CMS config, and there are no more LB switches to a backup server. I've put some of my own information about my GC settings on my personal Solr wiki page: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning I've got an 8GB heap on my systems running 3.5.0 (one copy of the index) and a 6GB heap on those running 4.2.1 (the other copy of the index). Summary: Just switching to the G1 collector won't solve GC pause problems. There's not a lot of G1 tuning information out there yet. If someone can come up with a good set of G1 tuning parameters, G1 might become better than CMS. Thanks, Shawn