Adding my anecdotes: 

I’m using heavily tuned ParNew/CMS. This is a SolrCloud collection, but 
per-node I’ve got a 28G heap and a 200G index. The large heap turned out to be 
necessary because certain operations in Lucene allocate memory based on things 
other than result size, (index size typically, or field cardinality) and small 
bursts of queries that used these allocations would otherwise cause overflows 
directly into tenured. Currently, I get a ParNew collection with a 200ms pause 
every few seconds. CMS collections happen every few hours.

I tested G1 at one point, and with a little tuning got it about the same pause 
levels as the current configuration, but given a rough equality, I stuck with 
Lucene’s recommendation for CMS.

As a result of this conversation, I actually went and tested the Parallel 
collector last night. I was frankly astonished to find that with no real tuning 
my query latency dropped across the board, from a few percent improvement in 
p50 to an almost 50% improvement in p99. Small collections happened at roughly 
the same rate, but with meaningfully lower pauses, and the pause durations were 
much more consistent.
However, (and Shawn called this) after about an hour at full query load it 
accumulated enough to do a large collection, and it took a 14 second pause to 
shed about 20G from the heap. That’s less than the default SolrCloud ZK 
timeout, but still impolite.

My conclusion is that Parallel is a lot better at cleaning eden space than 
ParNew, (why?) but losing the concurrent tenured collection is still pretty 
nasty.

Even so, I’m seriously considering whether I should be using Parallel in 
production as a result of this experiment. I have a reasonable number of 
replicas, and I use 98th percentile backup requests 
(https://github.com/whitepages/SOLR-4449) on shard fan-out requests, so having 
a node suddenly lock up very occasionally only really hurts the top-level 
queries addressed to that node, not the shard-requests. If I did backup 
requests at the SolrJ client level too, the overall latency savings might still 
be worth considering.



On 1/25/17, 5:35 PM, "Walter Underwood" <wun...@wunderwood.org> wrote:

    > On Jan 25, 2017, at 5:19 PM, Shawn Heisey <apa...@elyograg.org> wrote:
    > 
    >  It seems that Lucene/Solr
    > creates a lot of references as it runs, and collecting those in parallel
    > offers a significant performance advantage.
    
    This is critical for any tuning. Most of the query time allocations in Solr 
have the lifetime of a single request. Query parsing, result scoring, all that 
is garbage after the HTTP response is sent. So the GC must be configured with a 
large young generation (Eden, Nursery, whatever). If that generation cannot 
handle all the short-lived allocations under heavy load, they will be allocated 
from tenured space.
    
    Right now, we run with an 8G heap and 2G of young generation space with 
CMS/ParNew. We see a major GC every 30-60 minutes, depending on load.
    
    Cache evictions will always be garbage in tenured space, so we cannot avoid 
major GCs. The oldest non-accessed objects are evicted, and those will almost 
certainly be tenured.
    
    All this means that Solr puts a heavy burden on the GC. A combination of 
many short-lived allocations plus a steady flow of tenured garbage.
    
    wunder
    Walter Underwood
    wun...@wunderwood.org
    http://observer.wunderwood.org/  (my blog)
    
    

Reply via email to