On 6/5/2020 12:17 AM, Srinivas Kashyap wrote:
q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO 
*]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000&sort=MODIFY_TS 
desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 
asc,PHY_KEY6 asc,PHY_KEY7 asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc

This was the original query. Since there were lot of sorting fields, we decided 
to not do on the solr side, instead fetch the query response and do the sorting 
outside solr. This eliminated the need of more JVM memory which was allocated. 
Every time we ran this query, solr would crash exceeding the JVM memory. Now we 
are only running filter queries.

What Solr version, and what is the definition of each of the fields you're sorting on? If the definition doesn't include docValues, then a large on-heap memory structure will be created for sorting (VERY large with 500 million docs), and I wouldn't be surprised if it's created even if it is never used. The definition for any field you use for sorting should definitely include docValues. In recent Solr versions, docValues defaults to true for most field types. Some field classes, TextField in particular, cannot have docValues.

There's something else to discuss about sort params -- each sort field will only be used if ALL of the previous sort fields are identical for two documents in the full numFound result set. Having more than two or three sort fields is usually pointless. My guess (which I know could be wrong) is that most queries with this HUGE sort parameter will never use anything beyond TRACK_ID.

And regarding the filter cache, it is in default setup: (we are using default 
solrconfig.xml, and we have only added the request handler for DIH)

<filterCache class="solr.FastLRUCache"
                  size="512"
                  initialSize="512"
                  autowarmCount="0"/>

This is way too big for your index, and a prime candidate for why your heap requirements are so high. Like I said before, if the filterCache on your system actually reaches this max size, it will require 30GB of memory JUST for the filterCache on this core. Can you check the admin UI to determine what the size is and what hit ratio it's getting? (1.0 is 100% on the hit ratio). I'd probably start with a size of 32 or 64 on this cache. With a size of 64, a little less than 4GB would be the max heap allocated for the cache. You can experiment... but with 500 million docs, the filterCache size should be pretty small.

You're going to want to carefully digest this part of that wiki page that I linked earlier. Hopefully email will preserve this link completely:

https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-Reducingheaprequirements

Thanks,
Shawn

Reply via email to