More “unrealistic” than “amazing”. I bet the set of test queries is smaller than the query result cache size.
Results from cache are about 2 ms, but network communication to the shards would add enough overhead to reach 40 ms. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 28, 2017, at 5:59 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 4/27/2017 5:20 PM, Suresh Pendap wrote: >> Max throughput that I get: 12000 to 12500 reqs/sec >> 95 percentile query latency: 30 to 40 msec > > These numbers are *amazing* ... far better than I would have expected to > see on a 27GB index, even in a situation where it fits entirely into > available memory. I would only expect to see a few hundred requests per > second, maybe as much as several hundred. Congratulationsare definitely > deserved. > > Adding more shards as Toke suggested *might* help, but it might also > lower performance. More shards means that a single query from the > user's perspective becomes more queries in the background. Unless you > add servers to the cloud to handle the additional shards, more shards > will usually slow things down on an index with a high query rate. On > indexes with a very low query rate, more shards on the same hardware is > likely to be faster, because there will be plenty of idle CPU capacity. > > What Toke said about filter queries is right on the money. Uncached > filter queries are pretty expensive. Once a filter gets cached, it is > SUPER fast ... but if you are constantly changing the filter query, then > it is unlikely that new filters will be cached. > > When a particular query does not appear in either the queryResultCache > or the filterCache, running it as a clause on the q parameter will > usually be faster than running it as an fq parameter. If that exact > query text will be used a LOT, then it makes sense to put it into a > filter, where it will become very fast once it is cached. > > Thanks, > Shawn >