Sorry to hijack this a little bit. Shawn, what's the calculation for the size of the filter cache? Is that 1 bit per document in the core / shard? Thanks
On Fri, 5 Jun 2020 at 17:20, Shawn Heisey <apa...@elyograg.org> wrote: > On 6/5/2020 12:17 AM, Srinivas Kashyap wrote: > > q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO > *]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000&sort=MODIFY_TS > desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 > asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 > asc,PHY_KEY7 asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc > > > > This was the original query. Since there were lot of sorting fields, we > decided to not do on the solr side, instead fetch the query response and do > the sorting outside solr. This eliminated the need of more JVM memory which > was allocated. Every time we ran this query, solr would crash exceeding the > JVM memory. Now we are only running filter queries. > > What Solr version, and what is the definition of each of the fields > you're sorting on? If the definition doesn't include docValues, then a > large on-heap memory structure will be created for sorting (VERY large > with 500 million docs), and I wouldn't be surprised if it's created even > if it is never used. The definition for any field you use for sorting > should definitely include docValues. In recent Solr versions, docValues > defaults to true for most field types. Some field classes, TextField in > particular, cannot have docValues. > > There's something else to discuss about sort params -- each sort field > will only be used if ALL of the previous sort fields are identical for > two documents in the full numFound result set. Having more than two or > three sort fields is usually pointless. My guess (which I know could be > wrong) is that most queries with this HUGE sort parameter will never use > anything beyond TRACK_ID. > > > And regarding the filter cache, it is in default setup: (we are using > default solrconfig.xml, and we have only added the request handler for DIH) > > > > <filterCache class="solr.FastLRUCache" > > size="512" > > initialSize="512" > > autowarmCount="0"/> > > This is way too big for your index, and a prime candidate for why your > heap requirements are so high. Like I said before, if the filterCache > on your system actually reaches this max size, it will require 30GB of > memory JUST for the filterCache on this core. Can you check the admin > UI to determine what the size is and what hit ratio it's getting? (1.0 > is 100% on the hit ratio). I'd probably start with a size of 32 or 64 > on this cache. With a size of 64, a little less than 4GB would be the > max heap allocated for the cache. You can experiment... but with 500 > million docs, the filterCache size should be pretty small. > > You're going to want to carefully digest this part of that wiki page > that I linked earlier. Hopefully email will preserve this link completely: > > > https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-Reducingheaprequirements > > Thanks, > Shawn >