Re: facets & docValues

ART GALLERY Wed, 13 May 2020 09:59:13 -0700

check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219


On Thu, May 7, 2020 at 8:49 PM Joel Bernstein <joels...@gmail.com> wrote:
>
> You can be pretty sure that adding static warming queries will improve your
> performance following softcommits. But, opening new searchers every 2
> seconds may be too fast to allow for warming so you may need to adjust. As
> a general rule you cannot open searchers faster than you can warm them.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, May 5, 2020 at 5:54 PM Revas <revas2...@gmail.com> wrote:
>
> > Hi joel, No, we have not, we have softCommit requirement of 2 secs.
> >
> > On Tue, May 5, 2020 at 3:31 PM Joel Bernstein <joels...@gmail.com> wrote:
> >
> > > Have you configured static warming queries for the facets? This will warm
> > > the cache structures for the facet fields. You just want to make sure you
> > > commits are spaced far enough apart that the warming completes before a
> > new
> > > searcher starts warming.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Mon, May 4, 2020 at 10:27 AM Revas <revas2...@gmail.com> wrote:
> > >
> > > > Hi Erick, Thanks for the explanation and advise. With facet queries,
> > does
> > > > doc Values help at all ?
> > > >
> > > > 1) indexed=true, docValues=true =>  all facets
> > > >
> > > > 2)
> > > >
> > > >    -  indexed=true , docValues=true => only for subfacets
> > > >    - inexed=true, docValues=false=> facet query
> > > >    - docValues=true, indexed=false=> term facets
> > > >
> > > >
> > > >
> > > > In case of 1 above, => Indexing slowed considerably. over all facet
> > > > performance improved many fold
> > > > In case of  2            =>  over all performance showed only slight
> > > > improvement
> > > >
> > > > Does that mean turning on docValues even for facet query helps improve
> > > the
> > > > performance,  fetching from docValues for facet query is faster than
> > > > fetching from stored fields ?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > > > DocValues should help when faceting over fields, i.e.
> > facet.field=blah.
> > > > >
> > > > > I would expect docValues to help with sub facets and, but don’t know
> > > > > the code well enough to say definitely one way or the other.
> > > > >
> > > > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> > > and
> > > > > turn docValues off. What that means is that if any operation tries to
> > > > > uninvert
> > > > > the index on the Java heap, you’ll get an exception like:
> > > > > "can not sort on a field w/o docValues unless it is indexed=true
> > > > > uninvertible=true and the type supports Uninversion:”
> > > > >
> > > > > See SOLR-12962
> > > > >
> > > > > Speed is only one issue. The entire point of docValues is to not
> > > > “uninvert”
> > > > > the field on the heap. This used to lead to very significant memory
> > > > > pressure. So when turning docValues off, you run the risk of
> > > > > reverting back to the old behavior and having unexpected memory
> > > > > consumption, not to mention slowdowns when the uninversion
> > > > > takes place.
> > > > >
> > > > > Also, unless your documents are very large, this is a tiny corpus. It
> > > can
> > > > > be
> > > > > quite hard to get realistic numbers, the signal gets lost in the
> > noise.
> > > > >
> > > > > You should only shard when your individual query times exceed your
> > > > > requirement. Say you have a 95%tile requirement of 1 second response
> > > > time.
> > > > >
> > > > > Let’s further say that you can meet that requirement with 50
> > > > > queries/second,
> > > > > but when you get to 75 queries/second your response time exceeds your
> > > > > requirements. Do NOT shard at this point. Add another replica
> > instead.
> > > > > Sharding adds inevitable overhead and should only be considered when
> > > > > you can’t get adequate response time even under fairly light query
> > > loads
> > > > > as a general rule.
> > > > >
> > > > > Best,
> > > > > Erick
> > > > >
> > > > > > On Apr 16, 2020, at 12:08 PM, Revas <revas2...@gmail.com> wrote:
> > > > > >
> > > > > > Hi Erick, You are correct, we have only about 1.8M documents so far
> > > and
> > > > > > turning on the indexing on the facet fields helped improve the
> > > timings
> > > > of
> > > > > > the facet query a lot which has (sub facets and facet queries). So
> > > does
> > > > > > docValues help at all for sub facets and facet query, our tests
> > > > > > revealed further query time improvement when we turned off the
> > > > docValues.
> > > > > > is that the right approach?
> > > > > >
> > > > > > Currently we have only 1 shard and  we are thinking of scaling by
> > > > > > increasing the number of shards when we see a deterioration on
> > query
> > > > > time.
> > > > > > Any suggestions?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> > > > erickerick...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > > > > >>
> > > > > >> I think the key is the facet queries. Now, I’m talking from
> > > > > >> theory rather than diving into the code, but querying on
> > > > > >> a docValues=true, indexed=false field is really doing a
> > > > > >> search. And searching on a field like that is effectively
> > > > > >> analogous to a table scan. Even if somehow an internal
> > > > > >> structure would be constructed to deal with it, it would
> > > > > >> probably be on the heap, where you don’t want it.
> > > > > >>
> > > > > >> So the test would be to take the queries out and measure
> > > > > >> performance, but I think that’s the root issue here.
> > > > > >>
> > > > > >> Best,
> > > > > >> Erick
> > > > > >>
> > > > > >>> On Apr 14, 2020, at 11:51 PM, Revas <revas2...@gmail.com> wrote:
> > > > > >>>
> > > > > >>> We have faceting fields that have been defined as indexed=false,
> > > > > >>> stored=false and docValues=true
> > > > > >>>
> > > > > >>> However we use a lot of subfacets  using  json facets and facet
> > > > ranges
> > > > > >>> using facet.queries. We see that after every soft-commit our
> > > > > performance
> > > > > >>> worsens and performs ideal between commits
> > > > > >>>
> > > > > >>> how is that docValue fields are affected by soft-commit and do we
> > > > need
> > > > > to
> > > > > >>> enable indexing if we use subfacets and facet query to improve
> > > > > >> performance?
> > > > > >>>
> > > > > >>> Tha
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >

Re: facets & docValues

Reply via email to