Great, thanks Erick

On Mon, 8 Jun 2020 at 13:22, Erick Erickson <erickerick...@gmail.com> wrote:

> It’s _bounded_ buy MaxDoc/8 + (some overhead). The overhead is
> both the map overhead and the representation of the query.
>
> This is an upper bound, the full bitset is not stored if there
> are few entries that match the filter, in that case the
> doc IDs are stored. Consider if maxDoc is 1M and only 2 docs
> match the query, it’s much more efficient to store two ints
> rather than 1M/8.
>
> You can also limit the RAM used by specifying maxRamMB.
>
> Best,
> Erick
>
> > On Jun 8, 2020, at 4:59 AM, Colvin Cowie <colvin.cowie....@gmail.com>
> wrote:
> >
> > Sorry to hijack this a little bit. Shawn, what's the calculation for the
> > size of the filter cache?
> > Is that 1 bit per document in the core / shard?
> > Thanks
> >
> > On Fri, 5 Jun 2020 at 17:20, Shawn Heisey <apa...@elyograg.org> wrote:
> >
> >> On 6/5/2020 12:17 AM, Srinivas Kashyap wrote:
> >>> q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO
> >> *]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000&sort=MODIFY_TS
> >> desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1
> >> asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6
> >> asc,PHY_KEY7 asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc
> >>>
> >>> This was the original query. Since there were lot of sorting fields, we
> >> decided to not do on the solr side, instead fetch the query response
> and do
> >> the sorting outside solr. This eliminated the need of more JVM memory
> which
> >> was allocated. Every time we ran this query, solr would crash exceeding
> the
> >> JVM memory. Now we are only running filter queries.
> >>
> >> What Solr version, and what is the definition of each of the fields
> >> you're sorting on?  If the definition doesn't include docValues, then a
> >> large on-heap memory structure will be created for sorting (VERY large
> >> with 500 million docs), and I wouldn't be surprised if it's created even
> >> if it is never used.  The definition for any field you use for sorting
> >> should definitely include docValues.  In recent Solr versions, docValues
> >> defaults to true for most field types.  Some field classes, TextField in
> >> particular, cannot have docValues.
> >>
> >> There's something else to discuss about sort params -- each sort field
> >> will only be used if ALL of the previous sort fields are identical for
> >> two documents in the full numFound result set.  Having more than two or
> >> three sort fields is usually pointless.  My guess (which I know could be
> >> wrong) is that most queries with this HUGE sort parameter will never use
> >> anything beyond TRACK_ID.
> >>
> >>> And regarding the filter cache, it is in default setup: (we are using
> >> default solrconfig.xml, and we have only added the request handler for
> DIH)
> >>>
> >>> <filterCache class="solr.FastLRUCache"
> >>>                  size="512"
> >>>                  initialSize="512"
> >>>                  autowarmCount="0"/>
> >>
> >> This is way too big for your index, and a prime candidate for why your
> >> heap requirements are so high.  Like I said before, if the filterCache
> >> on your system actually reaches this max size, it will require 30GB of
> >> memory JUST for the filterCache on this core.  Can you check the admin
> >> UI to determine what the size is and what hit ratio it's getting? (1.0
> >> is 100% on the hit ratio).  I'd probably start with a size of 32 or 64
> >> on this cache.  With a size of 64, a little less than 4GB would be the
> >> max heap allocated for the cache.  You can experiment... but with 500
> >> million docs, the filterCache size should be pretty small.
> >>
> >> You're going to want to carefully digest this part of that wiki page
> >> that I linked earlier.  Hopefully email will preserve this link
> completely:
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-Reducingheaprequirements
> >>
> >> Thanks,
> >> Shawn
> >>
>
>

Reply via email to