Re: Solr node goes into recovery mode

Erick Erickson Tue, 14 May 2019 10:37:04 -0700

I think it’s the wrong question to ask. By using docaValues you’ll be able to 
significantly reduce the heap allocated to the Java process, reduce the 
overhead of garbage collections, reduce the possibility of nodes going into 
recovery and increase stability generally. Compared to those gains, performance 
improvements is a secondary concern.


Best,
Erick 

> On May 14, 2019, at 11:28 AM, Maulin Rathod <mrat...@asite.com> wrote:
> 
> Thanks Erick,
> 
> I understand using docvalue should improve query performance. Please correct 
> me if my understanding is incorrect.
> 
> Regards,
> 
> Maulin
> 
> 
> 
> On May 14, 2019 19:11, Erick Erickson <erickerick...@gmail.com> wrote:
> Use docValues on all fields you group, facet or sort on.
> 
> NOTE: you _must_ re-index from scratch, I’d index to a new collection and 
> start over. Paradoxically your index size _on disk_ will increase, but your 
> JVM will need drastically less heap. See: 
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> 
> Best,
> Erick
> 
>> On May 14, 2019, at 1:11 AM, Maulin Rathod <mrat...@asite.com> wrote:
>> 
>> Thanks for reply.
>> 
>> Our solr node normally uses 30-45 gb and hence we allocated 60 heap size.  
>> We analyzed heap dump and found that around 85% heap was used by 
>> org.apache.solr.uninverting.FieldCacheImpl.
>> --------------------
>> One instance of
>> "org.apache.solr.uninverting.FieldCacheImpl" loaded by 
>> "org.eclipse.jetty.webapp.WebAppClassLoader @ 0x48fe5e9b0" occupies 
>> 19,72,04,15,160 (86.28%) bytes. The memory is accumulated in one instance of 
>> "java.util.HashMap$Node[]" loaded by "<system class loader>".
>> --------------------
>> 
>> Please note we are not using any solr cache as in our scenario new documents 
>> added to index quite fast (at least 10 documents are added to index every 
>> second) and we need to open searcher again to make this new documents 
>> available.
>> 
>> We are not using docValues. As per our understanding using docValues to 
>> should improve query performance and should also reduce memory requirement 
>> as we are using lots of sorting/faceting in our queries. Please let me know 
>> your thoughts on it. Please also suggest if there are any other way to 
>> reduce to memory requirement/optimize the performance.
>> 
>> 
>> Regards,
>> 
>> Maulin
>> 
>> -----Original Message-----
>> From: Shawn Heisey <apa...@elyograg.org>
>> Sent: 14 May 2019 01:04
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr node goes into recovery mode
>> 
>> On 5/13/2019 8:26 AM, Maulin Rathod wrote:
>>> Recently we are observing issue where solr node (any random node) 
>>> automatically goes into recovery mode and stops responding.
>> 
>> Do you KNOW that these Solr instances actually need a 60GB heap?  That's a 
>> HUGE heap.  When a full GC happens on a heap that large, it's going to be a 
>> long pause, and there's nothing that can be done about it.
>> 
>>> We have enough memory allocated to Solr (60 gb) and system also have enough 
>>> memory (300 gb)...
>> 
>> As just mentioned, unless you are CERTAIN that you need a 60GB heap, which 
>> most users do not, don't set it that high.  Any advice you read that says 
>> "set the heap to XX percent of the installed system memory"
>> will frequently result in a setting that's incorrect for your specific setup.
>> 
>> And if you really DO need a 60GB heap, it would be recommended to either add 
>> more servers and put less of your index on each one, or to split your 
>> replicas between two Solr instances each running 31GB or less -- as Erick 
>> mentioned in his reply.
>> 
>>> We have analyzed GC logs and found that there was GC pause time of 
>>> 29.6583943 second when problem happened. Can this GC Pause lead to make the 
>>> node unavailable/recovery mode? or there could be some another reason ?
>> 
>>> Please note we have set zkClientTimeout to 10 minutes 
>>> (zkClientTimeout=600000) so that zookeeper will not consider this node 
>>> unavailable during high GC pause time.
>> 
>> You can't actually set that timeout that high.  I believe that ZooKeeper 
>> limits the session timeout to 20 times the tickTime, which is typically set 
>> to 2 seconds.  So 40 seconds is typically the maximum you can have for that 
>> timeout.  Solr's zkClientTimeout value is used to set ZooKeeper's session 
>> timeout.
>> 
>> And, as Erick also mentioned, there are other ways that a long GC pause can 
>> cause problems other than that specific timeout.  SolrCloud is not going to 
>> work well with a huge heap ... eventually a full GC is going to happen, and 
>> if it takes more than a few seconds, it's going to cause issues.
>> 
>> Thanks,
>> Shawn
>> 
>> [CC Award Winners!]
>> 
> 
> 
> 
> [CC Award Winners!]
>

Re: Solr node goes into recovery mode

Reply via email to