Re: Long GC pauses while reading Solr docs using Cursor approach

Walter Underwood Tue, 11 Apr 2017 20:49:08 -0700

JVM version? We’re running v8 update 121 with the G1 collector and it is 
working really well. We also have an 8GB heap.


Graph your heap usage. You’ll see a sawtooth shape, where it grows, then there 
is a major GC. The maximum of the base of the sawtooth is the working set of 
heap that your Solr installation needs. Set the heap to that value, plus a 
gigabyte or so. We run with a 2GB eden (new space) because so much of Solr’s 
allocations have a lifetime of one request. So, the base of the sawtooth, plus 
a gigabyte breathing room, plus two more for eden. That should work.

I don’t set all the ratios and stuff. When were running CMS, I set a size for 
the heap and a size for the new space. Done. With G1, I don’t even get that 
fussy.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Apr 11, 2017, at 8:22 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> 
> On 4/11/2017 2:56 PM, Chetas Joshi wrote:
>> I am using Solr (5.5.0) on HDFS. SolrCloud of 80 nodes. Sold collection
>> with number of shards = 80 and replication Factor=2
>> 
>> Sold JVM heap size = 20 GB
>> solr.hdfs.blockcache.enabled = true
>> solr.hdfs.blockcache.direct.memory.allocation = true
>> MaxDirectMemorySize = 25 GB
>> 
>> I am querying a solr collection with index size = 500 MB per core.
> 
> I see that you and I have traded messages before on the list.
> 
> How much total system memory is there per server?  How many of these
> 500MB cores are on each server?  How many docs are in a 500MB core?  The
> answers to these questions may affect the other advice that I give you.
> 
>> The off-heap (25 GB) is huge so that it can load the entire index.
> 
> I still know very little about how HDFS handles caching and memory.  You
> want to be sure that as much data as possible from your indexes is
> sitting in local memory on the server.
> 
>> Using cursor approach (number of rows = 100K), I read 2 fields (Total 40
>> bytes per solr doc) from the Solr docs that satisfy the query. The docs are 
>> sorted by "id" and then by those 2 fields.
>> 
>> I am not able to understand why the heap memory is getting full and Full
>> GCs are consecutively running with long GC pauses (> 30 seconds). I am
>> using CMS GC.
> 
> A 20GB heap is quite large.  Do you actually need it to be that large? 
> If you graph JVM heap usage over a long period of time, what are the low
> points in the graph?
> 
> A result containing 100K docs is going to be pretty large, even with a
> limited number of fields.  It is likely to be several megabytes.  It
> will need to be entirely built in the heap memory before it is sent to
> the client -- both as Lucene data structures (which will probably be
> much larger than the actual response due to Java overhead) and as the
> actual response format.  Then it will be garbage as soon as the response
> is done.  Repeat this enough times, and you're going to go through even
> a 20GB heap pretty fast, and need a full GC.  Full GCs on a 20GB heap
> are slow.
> 
> You could try switching to G1, as long as you realize that you're going
> against advice from Lucene experts.... but honestly, I do not expect
> this to really help, because you would probably still need full GCs due
> to the rate that garbage is being created.  If you do try it, I would
> strongly recommend the latest Java 8, either Oracle or OpenJDK.  Here's
> my wiki page where I discuss this:
> 
> https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_First.29_Collector
> 
> Reducing the heap size (which may not be possible -- need to know the
> answer to the question about memory graphing) and reducing the number of
> rows per query are the only quick solutions I can think of.
> 
> Thanks,
> Shawn
>

Re: Long GC pauses while reading Solr docs using Cursor approach

Reply via email to