Re: Solr Searcher 100% Latency Spike

Karl Stoney Thu, 30 Jan 2020 01:27:22 -0800

Hey Erick,
Firstly - thank you so much for your detailed response - it is really 
appreciated!
Unfortunately some of the context of my original message was lost in because 
the screenshots weren't there.
The additional latency spike does absolutely result in a poor user experience 
for us, some of our legacy applications hit solr quite a few times in order to 
render the client experience so the compound effect can take a search result 
render from 500ms to 3-4 seconds for a chunk of our users every 10 minutes.


I know I'll never get this down to 0, I'm just striving to make what changes 
are feasible without going down too much of a rabbit hole.  Please note I'm 
relatively new to Solr and have inherited a legacy stack __

The memory footprint is lower because I also reduced the size, not just the 
warming value.  The warmup time is now sub 1second which I'm good with.

I am working through the static warming queries today with one of the teams, so 
hopefully that will also have an impact.

I will look at the docValues as well.

Thanks again
Karl


On 30/01/2020, 00:24, "Erick Erickson" <erickerick...@gmail.com> wrote:

    Autowarming is significantly misunderstood. One of it's purposes in “the 
bad old days” was to rebuild very expensive on-heap structures for 
searching/sorting/grouping/and function queries.

    These are exactly what docValues are designed to make much, much faster.

    If you are still using spinning disks, the other benefit of warming queries 
is to read the index off disk and into MMapDirectory space. SSDs make this much 
faster too.


    I often see two common mistakes:
    1> no autowarming
    2> excessive autowarming

    I usually recommend people start with, say autowarm counts in the 10-20 as 
a start.

    One implication of what you’ve said so far is that the additional 9 seconds 
your old autowarming took didn’t get you any benefit either, so putting it back 
isn’t indicated. I’m not quite clear why you say your memory footprint is 
lower, it’s unrelated to autowarming unless you also decreased your size 
parameter. If you’re saying that your reduced cache size hasn’t changed your 
95th percentile, I’d keep reducing it until it _did_ have a measurable effect.

    The hit ratio is only loosely related to autowarming. So focusing on 
autowarming as a way to improve the hit ratio is probably the wrong focus.

    So the first thing I’d do is make very, very sure that all the fields I 
used for grouping/sorting/faceting/function operations are docValues. Second, a 
static warming query that insured this rather relying on autowarming of the 
queryResultCache to happen to exercise those functions would be another step. 
NOTE: you don’t have to do all those operations on every field, just sorting on 
each field would suffice. NOTE: as of Solr 7.6, you can add “uninvertible=true” 
to your field types to insure that you have docValues set, see: SOLR-12962

    And then I’d ask how much effort is smoothing out that kind of spike worth? 
You certainly see it with monitoring tools, but do users notice at all? If not, 
I wouldn’t spend all that much effort pursuing it…

    Best,
    Erick


    > On Jan 29, 2020, at 4:48 PM, Karl Stoney 
<karl.sto...@autotrader.co.uk.INVALID> wrote:
    >
    > So interestingly tweaking my filter cache i've got the warming time down 
to 1s (from 10!) and also reduced my memory footprint due to the smaller cache 
size.
    >
    > However, I still get these latency spikes (these changes have made no 
difference to them).
    >
    > So the theory about them being due to the warming being too intensive is 
wrong.
    >
    > I know the images didn't load btw so when I say spike I mean p95th 
response time going from 50ms to 100-120ms momentarily.
    > ________________________________
    > From: Walter Underwood <wun...@wunderwood.org>
    > Sent: 29 January 2020 21:30
    > To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
    > Subject: Re: Solr Searcher 100% Latency Spike
    >
    > Looking at the log, that takes one or two seconds after a complete batch 
reload (master/slave). So that is loading a cold index, all new files. This is 
not a big index, about a half million book titles.
    >
    > wunder
    > Walter Underwood
    > wun...@wunderwood.org
    > 
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7Cc67416e932d74851402d08d7a51ad3c3%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159406947278454&amp;sdata=E1YkJlFTDtQPSkC9%2BNHft%2FjqkuTFXaz0BKO5RxahV3w%3D&amp;reserved=0
  (my blog)
    >
    >> On Jan 29, 2020, at 1:21 PM, Karl Stoney 
<karl.sto...@autotrader.co.uk.INVALID> wrote:
    >>
    >> Out of curiosity, could you define "fast"?
    >> I'm wondering what sort of figures people target their searcher warm 
time at
    >> ________________________________
    >> From: Walter Underwood <wun...@wunderwood.org>
    >> Sent: 29 January 2020 21:13
    >> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
    >> Subject: Re: Solr Searcher 100% Latency Spike
    >>
    >> I use a static set of warming queries, about 20 of them. That is fast 
and gets a decent amount of the index into file buffers. Your top queries won’t 
change much unless you have a news site or a seasonal business.
    >>
    >> Like this:
    >>
    >>   <listener event="newSearcher" class="solr.QuerySenderListener">
    >>     <arr name="queries">
    >>       <lst>
    >>         <!-- Top non-numeric query words from August 2011 rush -->
    >>         <str name="q">introduction</str>
    >>         <str name="q">intermediate</str>
    >>         <str name="q">fundamentals</str>
    >>         <str name="q">understanding</str>
    >>         <str name="q">introductory</str>
    >>         <str name="q">precalculus</str>
    >>         <str name="q">foundations</str>
    >>         <str name="q">microeconomics</str>
    >>         <str name="q">microbiology</str>
    >>         <str name="q">macroeconomics</str>
    >>         <str name="q">discovering</str>
    >>         <str name="q">international</str>
    >>         <str name="q">mathematics</str>
    >>         <str name="q">organizational</str>
    >>         <str name="q">criminology</str>
    >>         <str name="q">developmental</str>
    >>         <str name="q">engineering</str>
    >>       </lst>
    >>     </arr>
    >>   </listener>
    >>
    >> wunder
    >> Walter Underwood
    >> wun...@wunderwood.org
    >> 
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7Cc67416e932d74851402d08d7a51ad3c3%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637159406947278454&amp;sdata=E1YkJlFTDtQPSkC9%2BNHft%2FjqkuTFXaz0BKO5RxahV3w%3D&amp;reserved=0
  (my blog)
    >>
    >>> On Jan 29, 2020, at 1:01 PM, Shawn Heisey <apa...@elyograg.org> wrote:
    >>>
    >>> On 1/29/2020 12:44 PM, Karl Stoney wrote:
    >>>> Looking for a bit of support here.  When we soft commit (every 10 
minutes), we get a latency spike that means response times for solr are loosely 
double, as you can see in this screenshot:
    >>>
    >>> Attachments almost never make it to the list.  We cannot see any of 
your screenshots.
    >>>
    >>>> They do correlate to filterCache warmup, which seem to take between 
10s and 30s:
    >>>> We don't have any other caches enabled, due to the high level of 
cardinality of the queries.
    >>>> The spikes are specifically on /select
    >>>> We have the following autowarm configuration for the filterCache:
    >>>>       <filterCache class="solr.FastLRUCache"
    >>>>                    size="8192"
    >>>>                    initialSize="8192"
    >>>>                    cleanupThread="true"
    >>>>                    autowarmCount="900"/>
    >>>
    >>> Autowarm, especially on filterCache, can be an extremely lengthy 
process.  What Solr must do in order to warm the cache here is execute up to 
900 queries, sequentially, on the new index.  That can take a lot of time and 
use a lot of resources like CPU and I/O.
    >>>
    >>> In order to reduce the impact of cache warming, I had to reduce my own 
autowarmCount on the filterCache to 4.
    >>>
    >>> Thanks,
    >>> Shawn
    >>
    >> This e-mail is sent on behalf of Auto Trader Group Plc, Registered 
Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in 
England No. 9439967). This email and any files transmitted with it are 
confidential and may be legally privileged, and intended solely for the use of 
the individual or entity to whom they are addressed. If you have received this 
email in error please notify the sender. This email message has been swept for 
the presence of computer viruses.
    >
    > This e-mail is sent on behalf of Auto Trader Group Plc, Registered 
Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in 
England No. 9439967). This email and any files transmitted with it are 
confidential and may be legally privileged, and intended solely for the use of 
the individual or entity to whom they are addressed. If you have received this 
email in error please notify the sender. This email message has been swept for 
the presence of computer viruses.



This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 
Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 
9439967). This email and any files transmitted with it are confidential and may 
be legally privileged, and intended solely for the use of the individual or 
entity to whom they are addressed. If you have received this email in error 
please notify the sender. This email message has been swept for the presence of 
computer viruses.

Re: Solr Searcher 100% Latency Spike

Reply via email to