Great, thanks Erick On Mon, 8 Jun 2020 at 13:22, Erick Erickson <erickerick...@gmail.com> wrote:
> It’s _bounded_ buy MaxDoc/8 + (some overhead). The overhead is > both the map overhead and the representation of the query. > > This is an upper bound, the full bitset is not stored if there > are few entries that match the filter, in that case the > doc IDs are stored. Consider if maxDoc is 1M and only 2 docs > match the query, it’s much more efficient to store two ints > rather than 1M/8. > > You can also limit the RAM used by specifying maxRamMB. > > Best, > Erick > > > On Jun 8, 2020, at 4:59 AM, Colvin Cowie <colvin.cowie....@gmail.com> > wrote: > > > > Sorry to hijack this a little bit. Shawn, what's the calculation for the > > size of the filter cache? > > Is that 1 bit per document in the core / shard? > > Thanks > > > > On Fri, 5 Jun 2020 at 17:20, Shawn Heisey <apa...@elyograg.org> wrote: > > > >> On 6/5/2020 12:17 AM, Srinivas Kashyap wrote: > >>> q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO > >> *]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000&sort=MODIFY_TS > >> desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 > >> asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 > >> asc,PHY_KEY7 asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc > >>> > >>> This was the original query. Since there were lot of sorting fields, we > >> decided to not do on the solr side, instead fetch the query response > and do > >> the sorting outside solr. This eliminated the need of more JVM memory > which > >> was allocated. Every time we ran this query, solr would crash exceeding > the > >> JVM memory. Now we are only running filter queries. > >> > >> What Solr version, and what is the definition of each of the fields > >> you're sorting on? If the definition doesn't include docValues, then a > >> large on-heap memory structure will be created for sorting (VERY large > >> with 500 million docs), and I wouldn't be surprised if it's created even > >> if it is never used. The definition for any field you use for sorting > >> should definitely include docValues. In recent Solr versions, docValues > >> defaults to true for most field types. Some field classes, TextField in > >> particular, cannot have docValues. > >> > >> There's something else to discuss about sort params -- each sort field > >> will only be used if ALL of the previous sort fields are identical for > >> two documents in the full numFound result set. Having more than two or > >> three sort fields is usually pointless. My guess (which I know could be > >> wrong) is that most queries with this HUGE sort parameter will never use > >> anything beyond TRACK_ID. > >> > >>> And regarding the filter cache, it is in default setup: (we are using > >> default solrconfig.xml, and we have only added the request handler for > DIH) > >>> > >>> <filterCache class="solr.FastLRUCache" > >>> size="512" > >>> initialSize="512" > >>> autowarmCount="0"/> > >> > >> This is way too big for your index, and a prime candidate for why your > >> heap requirements are so high. Like I said before, if the filterCache > >> on your system actually reaches this max size, it will require 30GB of > >> memory JUST for the filterCache on this core. Can you check the admin > >> UI to determine what the size is and what hit ratio it's getting? (1.0 > >> is 100% on the hit ratio). I'd probably start with a size of 32 or 64 > >> on this cache. With a size of 64, a little less than 4GB would be the > >> max heap allocated for the cache. You can experiment... but with 500 > >> million docs, the filterCache size should be pretty small. > >> > >> You're going to want to carefully digest this part of that wiki page > >> that I linked earlier. Hopefully email will preserve this link > completely: > >> > >> > >> > https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-Reducingheaprequirements > >> > >> Thanks, > >> Shawn > >> > >