Re: Solr takes time to warm up core with huge data

Jörn Franke Fri, 05 Jun 2020 00:01:12 -0700

I think DIH is the wrong solution for this. If you do an external custom load 
you will be probably much faster.


You have too much JVM memory from my point of view. Reduce it to eight or 
similar.

It seems you are just exporting data so you are better off work the exporting 
handler.
Add docvalues to the fields for this. It looks like you have no text field to 
be searched but only simple fields (string, date etc).

 You should not use the normal handler to return many results at once. If you 
cannot use the Export handler then use cursors :

https://lucene.apache.org/solr/guide/8_4/pagination-of-results.html#using-cursors

Both work to sort large result sets without consuming the whole memory

> Am 05.06.2020 um 08:18 schrieb Srinivas Kashyap 
> <srini...@bamboorose.com.invalid>:
> 
> Thanks Shawn,
> 
> The filter queries are not complex. Below are the filter queries I’m running 
> for the corresponding schema entry:
> 
> q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO 
> *]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000&sort=MODIFY_TS 
> desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 
> asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 
> asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc
> 
> This was the original query. Since there were lot of sorting fields, we 
> decided to not do on the solr side, instead fetch the query response and do 
> the sorting outside solr. This eliminated the need of more JVM memory which 
> was allocated. Every time we ran this query, solr would crash exceeding the 
> JVM memory. Now we are only running filter queries.
> 
> And regarding the filter cache, it is in default setup: (we are using default 
> solrconfig.xml, and we have only added the request handler for DIH)
> 
> <filterCache class="solr.FastLRUCache"
>                 size="512"
>                 initialSize="512"
>                 autowarmCount="0"/>
> 
> Now that you’re aware of the size and numbers, can you please let me know 
> what values/size that I need to increase? Is there an advantage of moving 
> this single core to solr cloud? If yes, can you let us know, how many 
> shards/replica do we require for this core considering we allow it to grow as 
> users transact. The updates to this core is not thru DIH delta import rather, 
> we are using SolrJ to push the changes.
> 
> <schema.xml>
> <field name="PARENT_DOC_ID"                                                   
>                              type="string"  indexed="true"  stored="true"    
> omitTermFreqAndPositions="true" />
> <field name="MODIFY_TS"                                                       
>                                    type="date"    indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> <field name="PHY_KEY1"                                                        
>                                      type="string"  indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> <field name="PHY_KEY2"                                                        
>                                      type="string"  indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> <field name="PHY_KEY3"                                                        
>                                      type="string"  indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> <field name="PHY_KEY4"                                                        
>                                      type="string"  indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> <field name="PHY_KEY5"                                                        
>                                      type="string"  indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> <field name="PHY_KEY6"                                                        
>                                      type="string"  indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> <field name="PHY_KEY7"                                                        
>                                      type="string"  indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> <field name="PHY_KEY8"                                                        
>                                      type="string"  indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> <field name="PHY_KEY9"                                                        
>                                      type="string"  indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> <field name="PHY_KEY10"                                                       
>                                     type="string"  indexed="true"  
> stored="true"    omitTermFreqAndPositions="true" />
> 
> 
> Thanks,
> Srinivas
> 
> 
> 
>> On 6/4/2020 9:51 PM, Srinivas Kashyap wrote:
>> We are on solr 8.4.1 and In standalone server mode. We have a core with 
>> 497,767,038 Records indexed. It took around 32Hours to load data through DIH.
>> 
>> The disk occupancy is shown below:
>> 
>> 82G /var/solr/data/<corename>/data/index
>> 
>> When I restarted solr instance and went to this core to query on solr admin 
>> GUI, it is hanging and is showing "Connection to Solr lost. Please check the 
>> Solr instance". But when I go back to dashboard, instance is up and I'm able 
>> to query other cores.
>> 
>> Also, querying on this core is eating up JVM memory allocated(24GB)/(32GB 
>> RAM). A query(*:*) with filterqueries is overshooting the memory with OOM.
> 
> You're going to want to have a lot more than 8GB available memory for
> disk caching with an 82GB index. That's a performance thing... with so
> little caching memory, Solr will be slow, but functional. That aspect
> of your setup will NOT lead to out of memory.
> 
> If you are experiencing Java "OutOfMemoryError" exceptions, you will
> need to figure out what resource is running out. It might be heap
> memory, but it also might be that you're hitting the process/thread
> limit of your operating system. And there are other possible causes for
> that exception too. Do you have the text of the exception available?
> It will be absolutely critical for you to determine what resource is
> running out, or you might focus your efforts on the wrong thing.
> 
> If it's heap memory (something that I can't really assume), then Solr is
> requiring more than the 24GB heap you've allocated.
> 
> Do you have faceting or grouping on those queries? Are any of your
> filters really large or complex? These are the things that I would
> imagine as requiring lots of heap memory.
> 
> What is the size of your filterCache? With about 500 million documents
> in the core, each entry in the filterCache will consume nearly 60
> megabytes of memory. If your filterCache has the default example size
> of 512, and it actually gets that big, then that single cache will
> require nearly 30 gigabytes of heap memory (on top of the other things
> in Solr that require heap) ... and you only have 24GB. That could cause
> OOME exceptions.
> 
> Does the server run things other than Solr?
> 
> Look here for some valuable info about performance and memory:
> 
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems<https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems>
> 
> Thanks,
> Shawn
> ________________________________
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender immediately 
> by replying to the e-mail, and then delete it without making copies or using 
> it in any way.
> No representation is made that this email or any attachments are free of 
> viruses. Virus scanning is recommended and is the responsibility of the 
> recipient.
> 
> Disclaimer
> 
> The information contained in this communication from the sender is 
> confidential. It is intended solely for use by the recipient and others 
> authorized to receive it. If you are not the recipient, you are hereby 
> notified that any disclosure, copying, distribution or taking action in 
> relation of the contents of this information is strictly prohibited and may 
> be unlawful.
> 
> This email has been scanned for viruses and malware, and may have been 
> automatically archived by Mimecast Ltd, an innovator in Software as a Service 
> (SaaS) for business. Providing a safer and more useful place for your human 
> generated data. Specializing in; Security, archiving and compliance. To find 
> out more visit the Mimecast website.

Re: Solr takes time to warm up core with huge data

Reply via email to