Hi Shawn, We¹re equally impressed by how well the server is handling it. We¹re using Sematext for monitoring and the load on the box has been steady under 1 and not entering a swap state memory wise.
We are 100% certain the traffic is coming from the 3 web hosts running this code. We have put some custom logging in place that logs all requests to an access style log and stores that data in kibana/logstash. In logstash we are able to confirm that all these requests (~40million in the last 12 hours) are coming from our web front ends directly to a single box in the cluster. Our client codes is on separate servers from our solr servers and zk has it¹s own boxes as well. Here¹s a scrubbed pastbin of our cluster status response from that machine that is getting all the traffic, I pulled this via browser on my local machine. https://pastebin.com/42haKVME We can attempt to update the SolrJ dependency on our lower env and see if that fixes the problem if you think that a good course of action, but we are also in the midst of switching over to HTTP Client to resolve the production issues we are seeing ASAP, so I can¹t promise a timeline. If you think there¹s a chance that will fix this, we could of course give it a quick go. -TZ On 11/6/18, 12:35 PM, "Shawn Heisey" <apa...@elyograg.org> wrote: >On 11/6/2018 10:12 AM, Zimmermann, Thomas wrote: >> Shawn - >> >> Server performance is fine and request time are great. We are tolerating >> the level of traffic, but the server that is taking all the hits is >> obviously performing a bit slower than the others. Response times are >> under 5MS avg for queries on all servers, which is within our perf >> thresholds. > >I was asking specifically about the clusterstatus requests -- whether >the response looks complete if you manually execute the same request and >whether it returns quickly. And I'd like to see the solr.log where >these are happening. > >Knowing that requests in general are performing well is good info, >although I have no idea how that is possible on the node that is getting >over a thousand clusterstatus requests per second. I would expect that >node to be essentially dead under that much load. Since it's apparently >handling it fine ... that's really impressive. > >> We are running 7.4 on the client and server side, moving to 7.5 was >> troublesome for us so we are holding off for the time being. > >I was hoping you could just upgrade the SolrJ client, which would >involve either replacing the solrj jar or bumping the version number in >the config for a dependency manager (things like ivy, maven, gradle, >etc). A 7.5 client should be pretty safe against 7.4 servers. The >client would be newer than the server and very close to the same >version, which is the general recommendation for CloudSolrClient when >the two versions cannot be identical for some reason. > >Are you absolutely sure that those requests are coming from the program >with CloudSolrClient? To find out, you'll need to enable the request >log in jetty.xml (it just needs to be un-commented) and restart the >server. The source address is not logged in solr.log. It's very >important to be absolutely sure where the requests are coming from. If >you're running the client code on the same machine as one of your Solr >servers, it will be difficult to be sure about the source, so I would >definitely suggest running the client code on a completely different >machine, so the source addresses in the request log are useful. > >Thanks, >Shawn >