Re: Solr takes time to warm up core with huge data

Erick Erickson Fri, 05 Jun 2020 05:21:12 -0700

My suspicion, as others have said, is that you simply have too much data on
too little hardware. Solr definitely should not be taking this long. Or rather,
if Solr is taking this long to start up you have a badly undersized system and
until you address that you’ll just be going ‘round in circles.


Lucene uses MMapDirectory to use OS memory space for almost all of the
actual index, see: 
https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
and you have 82G of index, and you only have 8G of OS memory space to hold it.

It’s certainly worth looking at how you use your index and whether you can 
make it smaller, but I’d say you simply won’t get satisfactory performance on 
such
constrained hardware.

You really need to go through “the sizing exercise” to see what your hardware 
and
usage patterns are, see: 
https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

> On Jun 5, 2020, at 3:48 AM, Srinivas Kashyap 
> <srini...@bamboorose.com.INVALID> wrote:
> 
> Hi Jörn,
> 
> I think, you missed my explanation. We are not using sorting now:
> 
> The original query:
> 
> q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO 
> *]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000&sort=MODIFY_TS 
> desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 
> asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 
> asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc
> 
> But now, I have removed sorting as shown below. The sorting is being done 
> outside solr:
> 
> q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO 
> *]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000
> 
> Also, we are writing custom code to index by discarding DIH too. When I 
> restart the solr, this core with huge data takes time to even show up the 
> query admin GUI console. It takes around 2 hours to show.
> 
> My question is, even for the simple query with filter query mentioned as 
> shown above, it is consuming JVM memory. So, how much memory or what 
> configuration should I be doing on solrconfig.xml to make it work.
> 
> Thanks,
> Srinivas
> 
> From: Jörn Franke <jornfra...@gmail.com>
> Sent: 05 June 2020 12:30
> To: solr-user@lucene.apache.org
> Subject: Re: Solr takes time to warm up core with huge data
> 
> I think DIH is the wrong solution for this. If you do an external custom load 
> you will be probably much faster.
> 
> You have too much JVM memory from my point of view. Reduce it to eight or 
> similar.
> 
> It seems you are just exporting data so you are better off work the exporting 
> handler.
> Add docvalues to the fields for this. It looks like you have no text field to 
> be searched but only simple fields (string, date etc).
> 
> You should not use the normal handler to return many results at once. If you 
> cannot use the Export handler then use cursors :
> 
> https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html#using-cursors<https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html#using-cursors>
> 
> Both work to sort large result sets without consuming the whole memory
> 
>> Am 05.06.2020 um 08:18 schrieb Srinivas Kashyap 
>> <srini...@bamboorose.com.invalid<mailto:srini...@bamboorose.com.invalid>>:
>> 
>> Thanks Shawn,
>> 
>> The filter queries are not complex. Below are the filter queries I’m running 
>> for the corresponding schema entry:
>> 
>> q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO 
>> *]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000&sort=MODIFY_TS 
>> desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 
>> asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 
>> asc,PHY_KEY7 asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc
>> 
>> This was the original query. Since there were lot of sorting fields, we 
>> decided to not do on the solr side, instead fetch the query response and do 
>> the sorting outside solr. This eliminated the need of more JVM memory which 
>> was allocated. Every time we ran this query, solr would crash exceeding the 
>> JVM memory. Now we are only running filter queries.
>> 
>> And regarding the filter cache, it is in default setup: (we are using 
>> default solrconfig.xml, and we have only added the request handler for DIH)
>> 
>> <filterCache class="solr.FastLRUCache"
>> size="512"
>> initialSize="512"
>> autowarmCount="0"/>
>> 
>> Now that you’re aware of the size and numbers, can you please let me know 
>> what values/size that I need to increase? Is there an advantage of moving 
>> this single core to solr cloud? If yes, can you let us know, how many 
>> shards/replica do we require for this core considering we allow it to grow 
>> as users transact. The updates to this core is not thru DIH delta import 
>> rather, we are using SolrJ to push the changes.
>> 
>> <schema.xml>
>> <field name="PARENT_DOC_ID" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="MODIFY_TS" type="date" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="PHY_KEY1" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="PHY_KEY2" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="PHY_KEY3" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="PHY_KEY4" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="PHY_KEY5" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="PHY_KEY6" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="PHY_KEY7" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="PHY_KEY8" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="PHY_KEY9" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> <field name="PHY_KEY10" type="string" indexed="true" stored="true" 
>> omitTermFreqAndPositions="true" />
>> 
>> 
>> Thanks,
>> Srinivas
>> 
>> 
>> 
>>> On 6/4/2020 9:51 PM, Srinivas Kashyap wrote:
>>> We are on solr 8.4.1 and In standalone server mode. We have a core with 
>>> 497,767,038 Records indexed. It took around 32Hours to load data through 
>>> DIH.
>>> 
>>> The disk occupancy is shown below:
>>> 
>>> 82G /var/solr/data/<corename>/data/index
>>> 
>>> When I restarted solr instance and went to this core to query on solr admin 
>>> GUI, it is hanging and is showing "Connection to Solr lost. Please check 
>>> the Solr instance". But when I go back to dashboard, instance is up and I'm 
>>> able to query other cores.
>>> 
>>> Also, querying on this core is eating up JVM memory allocated(24GB)/(32GB 
>>> RAM). A query(*:*) with filterqueries is overshooting the memory with OOM.
>> 
>> You're going to want to have a lot more than 8GB available memory for
>> disk caching with an 82GB index. That's a performance thing... with so
>> little caching memory, Solr will be slow, but functional. That aspect
>> of your setup will NOT lead to out of memory.
>> 
>> If you are experiencing Java "OutOfMemoryError" exceptions, you will
>> need to figure out what resource is running out. It might be heap
>> memory, but it also might be that you're hitting the process/thread
>> limit of your operating system. And there are other possible causes for
>> that exception too. Do you have the text of the exception available?
>> It will be absolutely critical for you to determine what resource is
>> running out, or you might focus your efforts on the wrong thing.
>> 
>> If it's heap memory (something that I can't really assume), then Solr is
>> requiring more than the 24GB heap you've allocated.
>> 
>> Do you have faceting or grouping on those queries? Are any of your
>> filters really large or complex? These are the things that I would
>> imagine as requiring lots of heap memory.
>> 
>> What is the size of your filterCache? With about 500 million documents
>> in the core, each entry in the filterCache will consume nearly 60
>> megabytes of memory. If your filterCache has the default example size
>> of 512, and it actually gets that big, then that single cache will
>> require nearly 30 gigabytes of heap memory (on top of the other things
>> in Solr that require heap) ... and you only have 24GB. That could cause
>> OOME exceptions.
>> 
>> Does the server run things other than Solr?
>> 
>> Look here for some valuable info about performance and memory:
>> 
>> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems<https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems><https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems<https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems>>
>> 
>> Thanks,
>> Shawn
>> ________________________________
>> DISCLAIMER:
>> E-mails and attachments from Bamboo Rose, LLC are confidential.
>> If you are not the intended recipient, please notify the sender immediately 
>> by replying to the e-mail, and then delete it without making copies or using 
>> it in any way.
>> No representation is made that this email or any attachments are free of 
>> viruses. Virus scanning is recommended and is the responsibility of the 
>> recipient.
>> 
>> Disclaimer
>> 
>> The information contained in this communication from the sender is 
>> confidential. It is intended solely for use by the recipient and others 
>> authorized to receive it. If you are not the recipient, you are hereby 
>> notified that any disclosure, copying, distribution or taking action in 
>> relation of the contents of this information is strictly prohibited and may 
>> be unlawful.
>> 
>> This email has been scanned for viruses and malware, and may have been 
>> automatically archived by Mimecast Ltd, an innovator in Software as a 
>> Service (SaaS) for business. Providing a safer and more useful place for 
>> your human generated data. Specializing in; Security, archiving and 
>> compliance. To find out more visit the Mimecast website.

Re: Solr takes time to warm up core with huge data

Reply via email to