RE: Re: Delayed/waiting requests

Gael Jourdan-Weil Wed, 16 Jan 2019 07:54:11 -0800

Ok, I get your point.


Do you know if there is a tool to easily view filterCache content?

I know we can see the top entries in the API or the UI but could we see more?


Regards,

Gaël

________________________________
De : Erick Erickson <erickerick...@gmail.com>
Envoyé : mardi 15 janvier 2019 19:46:19
À : solr-user
Objet : Re: Re: Delayed/waiting requests

bq. If I get your point, having a big cache might cause more troubles
than help if the cache hit ratio is not high enough because the cache
is constantly evicting/inserting entries?

Pretty much. Although there are nuances.

Right now, you have a 12K autowarm count. That means your cache will
eventually always contain 12K entries whether or not you ever use the
last 11K! I'm simplifying a bit, but it grows like this.

Let's say I start Solr. Initially it has no cache entries. Now I start
both querying and indexing. For simplicity, say I have 100 _new_  fq
clauses come in between each commit. The first commit will autowarm
100. The next will autowarm 200, then 300...... etc. Eventually this
will grow to 12K. So your performance will start to vary depending on
how long Solr has been running.

Worse. it's not clear that you _ever_ re-use those clauses. One example:
fq=date_field:[* TO NOW]
NOW is really a Unix timestamp. So issuing the same fq 1 millisecond
from the first one will not re-use the entry. In the worst case almost
all of your autwarming is useless. It neither loads relevant index
data into RAM nor is reusable.

Even if you use "date math" to round to, say, a minute, if you run
Solr long enough you'll still fill up with useless fq clauses.

Best,
Erick

On Tue, Jan 15, 2019 at 9:33 AM Gael Jourdan-Weil
<gael.jourdan-w...@kelkoogroup.com> wrote:
>
> @Erick:
>
>
> We will try to lower the autowarm and run some tests to compare.
>
> If I get your point, having a big cache might cause more troubles than help 
> if the cache hit ratio is not high enough because the cache is constantly 
> evicting/inserting entries?
>
>
>
> @Jeremy:
>
>
> Index size: ~20G and ~14M documents
>
> Server memory available: 256G from which ~30G used and ~100G system cache
>
> Server CPU count: 32, ~10% usage
>
> JVM memory settings: -Xms12G -Xmx12G
>
>
> We have 3 servers and 3 clusters of 3 Solr instances.
>
> That is each server hosts 1 Solr instance for each cluster.
>
> And, indeed, each cluster only has 1 shard with replication factor 3.
>
>
> Among all these Solr instances, the pauses are observed on only one single 
> cluster but on every server at different times (sometimes on all servers at 
> the same time but I would say it's very rare).
>
> We do observe the traffic is evenly balanced across the 3 servers, around 
> 30-40 queries per second sent to each server.
>
>
>
> Regards,
>
> Gaël
>
>
> ________________________________
> De : Branham, Jeremy (Experis) <jb...@allstate.com>
> Envoyé : mardi 15 janvier 2019 17:59:56
> À : solr-user@lucene.apache.org
> Objet : Re: Re: Delayed/waiting requests
>
> Hi Gael –
>
> Could you share this information?
> Size of the index
> Server memory available
> Server CPU count
> JVM memory settings
>
> You mentioned a cloud configuration of 3 replicas.
> Does that mean you have 1 shard with a replication factor of 3?
> Do the pauses occur on all 3 servers?
> Is the traffic evenly balanced across those servers?
>
>
> Jeremy Branham
> jb...@allstate.com
>
>
> On 1/15/19, 9:50 AM, "Erick Erickson" <erickerick...@gmail.com> wrote:
>
>     Well, it was a nice theory anyway.
>
>     "Other collections with the same settings"
>     doesn't really mean much unless those other collections are very similar,
>     especially in terms of numbers of docs.
>
>     You should only see a new searcher opening when you do a
>     hard-commit-with-opensearcher-true or soft commit.
>
>     So what happens when you just try lowering the autowarm
>     count? I'm assuming you're free to test in some non-prod
>     system.
>
>     Focusing on the hit ratio is something of a red herring. Remember
>     that each entry in your filterCache is roughly maxDoc/8 + a little
>     overhead, the increase in GC pressure has to be balanced
>     against getting the hits from the cache.
>
>     Now, all that said if there's no correlation, then you need to put
>     a profiler on the system when you see this kind of thing and
>     find out where the hotspots are, otherwise it's guesswork and
>     I'm out of ideas.
>
>     Best,
>     Erick
>
>     On Tue, Jan 15, 2019 at 12:06 AM Gael Jourdan-Weil
>     <gael.jourdan-w...@kelkoogroup.com> wrote:
>     >
>     > Hi Erick,
>     >
>     >
>     > Thank you for your detailed answer, I better understand autowarming.
>     >
>     >
>     > We have an autowarming time of ~10s for filterCache (queryResultCache 
> is not used at all, ratio = 0.02).
>     >
>     > We increased the size of the filterCache from 6k to 12k (and 
> autowarming size set to same values) to have a better ratio which is _only_ 
> around 0.85/0.90.
>     >
>     >
>     > The thing I don't understand is I should see "Opening new searcher" in 
> the logs everytime a new searcher is opened and thus an autowarming happens, 
> right?
>     >
>     > But I don't see "Opening new searcher" very often, and I don't see it 
> being correlated with the response time peaks.
>     >
>     >
>     > Also, I didn't mention it earlier but, we have other SolrCloud clusters 
> with similar settings and load (~10s filterCache autowarming, 10k entries) 
> and we don't observe the same behavior.
>     >
>     >
>     > Regards,
>     >
>     > ________________________________
>     > De : Erick Erickson <erickerick...@gmail.com>
>     > Envoyé : lundi 14 janvier 2019 17:44:38
>     > À : solr-user
>     > Objet : Re: Delayed/waiting requests
>     >
>     > Gael:
>     >
>     > bq. Nevertheless, our filterCache is set to autowarm 12k entries which
>     > is also the maxSize
>     >
>     > That is far, far, far too many. Let's assume you actually have 12K
>     > entries in the filterCache.
>     > Every time you open a new searcher, 12K queries are executed _before_
>     > the searcher
>     > accepts any new requests. While being able to re-use a filterCache
>     > entry is useful, one of
>     > the primary purposes is to pre-load index data from disk into memory
>     > which can be
>     > the event that takes the most time.
>     >
>     > The queryResultCache has a similar function. I often find that this
>     > cache doesn't have a
>     > very high hit ratio, but again executing a _few_ of these queries
>     > warms the index from
>     > disk.
>     >
>     > I think of both caches as a map, where the key is the "thing", (fq
>     > clause in the case
>     > of filterCache, the whole query in the case of the queryResultCache).
>     > Autowarming
>     > replays the most recently executed N of these entries, essentially
>     > just as though
>     > they were submitted by a user.
>     >
>     > Hypothesis: You're massively over-warming, and when that kicks in 
> you're seeing
>     > increased CPU and GC pressure leading to the anomalies you're seeing. 
> Further,
>     > you have such excessive autowarming going on that it's hard to see the
>     > associated messages in the log.
>     >
>     > Here's what I'd recommend: Set your autowarm counts to something on the 
> order
>     > of 16. If the culprit is just excessive autowarming, I'd expect your 
> spikes to
>     > be much less severe. It _might_ be that your users see some increased 
> (very
>     > temporary) variance in response time. You can tell that the autowarming
>     > configurations are "more art than science", I can't give you any other
>     > recommendations than "start small and increase until you're happy"
>     > unfortunately.
>     >
>     > I usually do this with some kind of load tester in a dev lab of course 
> ;).
>     >
>     > Finally, if you use the metrics data (see:
>     > 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F1_metrics-2Dreporting.html&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=9BWTVr32mplsfAWQ3hnWuVx5V1cL_RgLNDDpg8S2mtk&e=)
>     > you can see the autowarm times. Don't get too lost in the page to
>     > start, just hit the 
> "https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_metrics&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=F8ahrx-8r_hWi7F1M0fHZc-_gEqsmr-yPvSZlGYX5rw&e=";
>  endpoint
>     > and look for "warmupTime", then refine on how to get _only_
>     > the warmup stats ;).
>     >
>     > Best,
>     > Erick
>     >
>     > On Mon, Jan 14, 2019 at 5:08 AM Gael Jourdan-Weil
>     > <gael.jourdan-w...@kelkoogroup.com> wrote:
>     > >
>     > > I had a look to GC logs this morning but I'm not sure how to 
> interpret them.
>     > >
>     > >
>     > > Over a period of 54mn, there is:
>     > >
>     > > - Number of pauses: 2739
>     > >
>     > > - Accumulated pauses: 93s => that is 2.86% of the time
>     > >
>     > > - Average pause duration: 0.03s
>     > >
>     > > - Average pause interval: 1.18s
>     > >
>     > > - Accumulated full GC: 0
>     > >
>     > > I'm not sure if this is a lot or not. What do you think ?
>     > >
>     > >
>     > > Looking more closely to GC logs with GC Viewer, I can notice that the 
> high response time peaks happens at the same time where GC pauses takes 2x 
> more time (around 0.06s) than average.
>     > >
>     > >
>     > > Also we are indeed indexing at the same time but we have autowarming 
> set.
>     > >
>     > > I don't see any Searcher opened at the time we experience slowness.
>     > >
>     > > Nevertheless, our filterCache is set to autowarm 12k entries which is 
> also the maxSize.
>     > >
>     > > Could this have any downside?
>     > >
>     > >
>     > > Thanks,
>     > >
>     > > Gaël
>     > >
>     > >
>     > > ________________________________
>     > > De : Erick Erickson <erickerick...@gmail.com>
>     > > Envoyé : vendredi 11 janvier 2019 17:21
>     > > À : solr-user
>     > > Objet : Re: Delayed/waiting requests
>     > >
>     > > Jimi's comment is one of the very common culprits.
>     > >
>     > > Autowarming is another. Are you indexing at the same
>     > > time? If so it could well be  you aren't autowarming and
>     > > the spikes are caused by using a new IndexSearcher
>     > > that has to read much of the index off disk when commits
>     > > happen. The "smoking gun" here would be if the spikes
>     > > correlate to your commits (soft or hard-with-opensearcher-true).
>     > >
>     > > Best,
>     > > Erick
>     > >
>     > > On Fri, Jan 11, 2019 at 1:23 AM Gael Jourdan-Weil
>     > > <gael.jourdan-w...@kelkoogroup.com> wrote:
>     > > >
>     > > > Interesting indeed, we did not see anything with VisualVM but 
> having a look at the GC logs could gives us more info, especially on the 
> pauses.
>     > > >
>     > > > I will collect data over the week-end and look at it.
>     > > >
>     > > >
>     > > > Thanks
>     > > >
>     > > > ________________________________
>     > > > De : Hullegård, Jimi <jimi.hulleg...@svensktnaringsliv.se>
>     > > > Envoyé : vendredi 11 janvier 2019 03:46:02
>     > > > À : solr-user@lucene.apache.org
>     > > > Objet : Re: Delayed/waiting requests
>     > > >
>     > > > Could be caused by garbage collection in the jvm.
>     > > >
>     > > > 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_solr_SolrPerformanceProblems&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=csud3Q2nMknKdqj5WMFHMA7mK8nDBNlxAmbDafyp2us&e=
>     > > >
>     > > > Go down to the segment called “GC pause problems”
>     > > >
>     > > > /Jimi
>     > > >
>     > > > Sent from my iPhone
>     > > >
>     > > > On 11 Jan 2019, at 05:05, Gael Jourdan-Weil 
> <gael.jourdan-w...@kelkoogroup.com<mailto:gael.jourdan-w...@kelkoogroup.com>> 
> wrote:
>     > > >
>     > > > Hello,
>     > > >
>     > > > We are experiencing some performance issues on a simple SolrCloud 
> cluster of 3 replicas (1 core) but what we found during our analysis seems a 
> bit odd, so we thought the community could have relevant ideas on this.
>     > > >
>     > > > Load: between 30 and 40 queries per second, constant over time of 
> analysis
>     > > >
>     > > > Symptoms: high response time over short period of time but quite 
> frequently.
>     > > > We are talking about requests response time going from 50ms to 
> 5000ms or even worse during less than 5 seconds, and then going back to 
> normal.
>     > > >
>     > > > What we found out: just before response time increase, requests 
> seems to be delayed.
>     > > > That is during 2/3 seconds, requests pile up, no response is sent, 
> and then all requests are resolved and responses are all returned to the 
> clients at the same time.
>     > > > Very much like if there was a lock happening somewhere. But we 
> found no "lock" time nor at JVM or system level.
>     > > >
>     > > > Does someone can think of something that could explain this in the 
> way Solr works ?
>     > > > Or ideas to track down the root cause..
>     > > >
>     > > > Solr version is 7.2.1.
>     > > >
>     > > > Thanks for reading,
>     > > >
>     > > > Gaël Jourdan-Weil
>     > > >
>     > > > Svenskt Näringsliv behandlar dina personuppgifter i enlighet med 
> GDPR. Här kan du läsa mer om vår behandling och dina rättigheter, 
> Integritetspolicy<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.svensktnaringsliv.se_dataskydd_integritet-2Doch-2Dbehandling-2Dav-2Dpersonuppgifter-5F697219.html-3Futm-5Fsource-3Dsn-2Demail-26utm-5Fmedium-3Demail&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=YaT3hPS_GQ4_rKBEfndYZC583WM6K6boCdbEJKTQirA&e=>
>
>

RE: Re: Delayed/waiting requests

Reply via email to