Hi Erick,

Thank you for your detailed answer, I better understand autowarming.


We have an autowarming time of ~10s for filterCache (queryResultCache is not 
used at all, ratio = 0.02).

We increased the size of the filterCache from 6k to 12k (and autowarming size 
set to same values) to have a better ratio which is _only_ around 0.85/0.90.


The thing I don't understand is I should see "Opening new searcher" in the logs 
everytime a new searcher is opened and thus an autowarming happens, right?

But I don't see "Opening new searcher" very often, and I don't see it being 
correlated with the response time peaks.


Also, I didn't mention it earlier but, we have other SolrCloud clusters with 
similar settings and load (~10s filterCache autowarming, 10k entries) and we 
don't observe the same behavior.


Regards,

________________________________
De : Erick Erickson <erickerick...@gmail.com>
Envoyé : lundi 14 janvier 2019 17:44:38
À : solr-user
Objet : Re: Delayed/waiting requests

Gael:

bq. Nevertheless, our filterCache is set to autowarm 12k entries which
is also the maxSize

That is far, far, far too many. Let's assume you actually have 12K
entries in the filterCache.
Every time you open a new searcher, 12K queries are executed _before_
the searcher
accepts any new requests. While being able to re-use a filterCache
entry is useful, one of
the primary purposes is to pre-load index data from disk into memory
which can be
the event that takes the most time.

The queryResultCache has a similar function. I often find that this
cache doesn't have a
very high hit ratio, but again executing a _few_ of these queries
warms the index from
disk.

I think of both caches as a map, where the key is the "thing", (fq
clause in the case
of filterCache, the whole query in the case of the queryResultCache).
Autowarming
replays the most recently executed N of these entries, essentially
just as though
they were submitted by a user.

Hypothesis: You're massively over-warming, and when that kicks in you're seeing
increased CPU and GC pressure leading to the anomalies you're seeing. Further,
you have such excessive autowarming going on that it's hard to see the
associated messages in the log.

Here's what I'd recommend: Set your autowarm counts to something on the order
of 16. If the culprit is just excessive autowarming, I'd expect your spikes to
be much less severe. It _might_ be that your users see some increased (very
temporary) variance in response time. You can tell that the autowarming
configurations are "more art than science", I can't give you any other
recommendations than "start small and increase until you're happy"
unfortunately.

I usually do this with some kind of load tester in a dev lab of course ;).

Finally, if you use the metrics data (see:
https://lucene.apache.org/solr/guide/7_1/metrics-reporting.html)
you can see the autowarm times. Don't get too lost in the page to
start, just hit the "http://localhost:8983/solr/admin/metrics"; endpoint
and look for "warmupTime", then refine on how to get _only_
the warmup stats ;).

Best,
Erick

On Mon, Jan 14, 2019 at 5:08 AM Gael Jourdan-Weil
<gael.jourdan-w...@kelkoogroup.com> wrote:
>
> I had a look to GC logs this morning but I'm not sure how to interpret them.
>
>
> Over a period of 54mn, there is:
>
> - Number of pauses: 2739
>
> - Accumulated pauses: 93s => that is 2.86% of the time
>
> - Average pause duration: 0.03s
>
> - Average pause interval: 1.18s
>
> - Accumulated full GC: 0
>
> I'm not sure if this is a lot or not. What do you think ?
>
>
> Looking more closely to GC logs with GC Viewer, I can notice that the high 
> response time peaks happens at the same time where GC pauses takes 2x more 
> time (around 0.06s) than average.
>
>
> Also we are indeed indexing at the same time but we have autowarming set.
>
> I don't see any Searcher opened at the time we experience slowness.
>
> Nevertheless, our filterCache is set to autowarm 12k entries which is also 
> the maxSize.
>
> Could this have any downside?
>
>
> Thanks,
>
> Gaël
>
>
> ________________________________
> De : Erick Erickson <erickerick...@gmail.com>
> Envoyé : vendredi 11 janvier 2019 17:21
> À : solr-user
> Objet : Re: Delayed/waiting requests
>
> Jimi's comment is one of the very common culprits.
>
> Autowarming is another. Are you indexing at the same
> time? If so it could well be  you aren't autowarming and
> the spikes are caused by using a new IndexSearcher
> that has to read much of the index off disk when commits
> happen. The "smoking gun" here would be if the spikes
> correlate to your commits (soft or hard-with-opensearcher-true).
>
> Best,
> Erick
>
> On Fri, Jan 11, 2019 at 1:23 AM Gael Jourdan-Weil
> <gael.jourdan-w...@kelkoogroup.com> wrote:
> >
> > Interesting indeed, we did not see anything with VisualVM but having a look 
> > at the GC logs could gives us more info, especially on the pauses.
> >
> > I will collect data over the week-end and look at it.
> >
> >
> > Thanks
> >
> > ________________________________
> > De : Hullegård, Jimi <jimi.hulleg...@svensktnaringsliv.se>
> > Envoyé : vendredi 11 janvier 2019 03:46:02
> > À : solr-user@lucene.apache.org
> > Objet : Re: Delayed/waiting requests
> >
> > Could be caused by garbage collection in the jvm.
> >
> > https://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > Go down to the segment called “GC pause problems”
> >
> > /Jimi
> >
> > Sent from my iPhone
> >
> > On 11 Jan 2019, at 05:05, Gael Jourdan-Weil 
> > <gael.jourdan-w...@kelkoogroup.com<mailto:gael.jourdan-w...@kelkoogroup.com>>
> >  wrote:
> >
> > Hello,
> >
> > We are experiencing some performance issues on a simple SolrCloud cluster 
> > of 3 replicas (1 core) but what we found during our analysis seems a bit 
> > odd, so we thought the community could have relevant ideas on this.
> >
> > Load: between 30 and 40 queries per second, constant over time of analysis
> >
> > Symptoms: high response time over short period of time but quite frequently.
> > We are talking about requests response time going from 50ms to 5000ms or 
> > even worse during less than 5 seconds, and then going back to normal.
> >
> > What we found out: just before response time increase, requests seems to be 
> > delayed.
> > That is during 2/3 seconds, requests pile up, no response is sent, and then 
> > all requests are resolved and responses are all returned to the clients at 
> > the same time.
> > Very much like if there was a lock happening somewhere. But we found no 
> > "lock" time nor at JVM or system level.
> >
> > Does someone can think of something that could explain this in the way Solr 
> > works ?
> > Or ideas to track down the root cause..
> >
> > Solr version is 7.2.1.
> >
> > Thanks for reading,
> >
> > Gaël Jourdan-Weil
> >
> > Svenskt Näringsliv behandlar dina personuppgifter i enlighet med GDPR. Här 
> > kan du läsa mer om vår behandling och dina rättigheter, 
> > Integritetspolicy<https://www.svensktnaringsliv.se/dataskydd/integritet-och-behandling-av-personuppgifter_697219.html?utm_source=sn-email&utm_medium=email>

Reply via email to