I use production logs to get a mix of common and long-tail queries. It is very 
hard to get a realistic distribution with synthetic queries.

A benchmark run goes like this, with a big shell script driving it.

1. Reload the collection to clear caches.
2. Split the log into a cache warming set (usually the first 2000 queries) and 
the rest.
3. Run the warming set with four threads and no delay. This gets it done but 
usually does not overload the server.
4. Run the test set with hundreds of threads, each set for a particular rate. 
The overall config is usually between 2000 and 10,000 requests per minute.
5. Tests run for 1-2 hours.
6. Grep the results for non-200 responses, filter them out, and report.
7. Post process the results to make a CSV file of the percentile response 
times, one column for each request handler.

The benchmark driver is a headless JMeter, run with two different config files 
(warming and test). The post processing is a JMeter add-on.

If the CPU gets over about 60% or the run queue gets to about the number of 
processors, the hosts are near congestion. The response time will spike if it 
is pushed harder than that.

Prod logs are usually from a few hours of peak traffic during the daytime. This 
reduces the amount of bot traffic in the logs. I filter out load balancer 
health checks, Zabbix checks, and so on. I like to get a log of a million 
queries. That might require grabbing pen traffic logs from several days.

With the master/slave cluster, I use logs from a single slave. Those will have 
a lower cache hit rate because the requests are randomly spread out. For our 
Solr Cloud cluster, I’ve created a prod-size cluster in test. Expensive!

There a script in the JMeter config to make /handler and /select?qt=/handler 
get reported as the same thing. Thank you SolrJ.

Our SLAs are for 95th percentile.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Apr 28, 2017, at 11:39 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> Well, the best way to get no cache hits is to set the cache sizes to
> zero ;). That provides worst-case scenarios and tells you exactly how
> much you're relying on caches. I'm not talking the lower-level Lucene
> caches here.
> 
> One thing I've done is use the TermsComponent to generate a list of
> terms actually in my corpus, and save them away "somewhere" to
> substitute into my queries. The problem with that is when you have
> anything except very simple queries involving AND, you generate
> unrealistic queries when you substitute in random values; you can be
> asking for totally unrelated terms and especially on short fields that
> leads to lots of 0-hit queries which are also unrealistic.
> 
> So you get into a long cycle of generating a bunch of queries and
> removing all queries with less than N hits when you run them. Then
> generating more. Then... And each time you pick N, it introduces
> another layer of not-real-world possibly.
> 
> Sometimes it's the best you can do, but if you can cull real-world
> applications it's _much_ better. Once you have a bunch (I like 10,000)
> you can be pretty confident. I not only like to run them randomly, but
> I also like to sub-divide them into N buckets and then run each bucket
> in order on the theory that that mimics what users actually did, they
> don't usually just do stuff at random. Any differences between the
> random and non-random runs can give interesting information.
> 
> Best,
> Erick
> 
> On Fri, Apr 28, 2017 at 9:38 AM, Rick Leir <rl...@leirtech.com> wrote:
>> (aside: Using Gatling or Jmeter?)
>> 
>> Question: How can you easily randomize something in the query so you get no 
>> cache hits? I think there are several levels of caching.
>> 
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com

Reply via email to