Sorry to hijack this a little bit. Shawn, what's the calculation for the
size of the filter cache?
Is that 1 bit per document in the core / shard?
Thanks

On Fri, 5 Jun 2020 at 17:20, Shawn Heisey <apa...@elyograg.org> wrote:

> On 6/5/2020 12:17 AM, Srinivas Kashyap wrote:
> > q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO
> *]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000&sort=MODIFY_TS
> desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1
> asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6
> asc,PHY_KEY7 asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc
> >
> > This was the original query. Since there were lot of sorting fields, we
> decided to not do on the solr side, instead fetch the query response and do
> the sorting outside solr. This eliminated the need of more JVM memory which
> was allocated. Every time we ran this query, solr would crash exceeding the
> JVM memory. Now we are only running filter queries.
>
> What Solr version, and what is the definition of each of the fields
> you're sorting on?  If the definition doesn't include docValues, then a
> large on-heap memory structure will be created for sorting (VERY large
> with 500 million docs), and I wouldn't be surprised if it's created even
> if it is never used.  The definition for any field you use for sorting
> should definitely include docValues.  In recent Solr versions, docValues
> defaults to true for most field types.  Some field classes, TextField in
> particular, cannot have docValues.
>
> There's something else to discuss about sort params -- each sort field
> will only be used if ALL of the previous sort fields are identical for
> two documents in the full numFound result set.  Having more than two or
> three sort fields is usually pointless.  My guess (which I know could be
> wrong) is that most queries with this HUGE sort parameter will never use
> anything beyond TRACK_ID.
>
> > And regarding the filter cache, it is in default setup: (we are using
> default solrconfig.xml, and we have only added the request handler for DIH)
> >
> > <filterCache class="solr.FastLRUCache"
> >                   size="512"
> >                   initialSize="512"
> >                   autowarmCount="0"/>
>
> This is way too big for your index, and a prime candidate for why your
> heap requirements are so high.  Like I said before, if the filterCache
> on your system actually reaches this max size, it will require 30GB of
> memory JUST for the filterCache on this core.  Can you check the admin
> UI to determine what the size is and what hit ratio it's getting? (1.0
> is 100% on the hit ratio).  I'd probably start with a size of 32 or 64
> on this cache.  With a size of 64, a little less than 4GB would be the
> max heap allocated for the cache.  You can experiment... but with 500
> million docs, the filterCache size should be pretty small.
>
> You're going to want to carefully digest this part of that wiki page
> that I linked earlier.  Hopefully email will preserve this link completely:
>
>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-Reducingheaprequirements
>
> Thanks,
> Shawn
>

Reply via email to