Re: Facet Performance

James Bodkin Tue, 16 Jun 2020 07:19:17 -0700

We've changed the schema to enable docValues for these fields and this led to 
an improvement in the response time. We found a further improvement by also 
switching off indexed as these fields are used for faceting and filtering only.
Since those changes, we've found that the first-execution for queries is really 
noticeable. I thought this would be the filterCache based on what I saw in 
NewRelic however it is probably trying to read the docValues from disk. How can 
we use the autowarming to improve this?


For example, I've run the following queries in sequence and each query has a 
first-execution penalty.

Query 1:

q=*:*
facet=true
facet.field=D_DepartureAirport
facet.field=D_Destination
facet.limit=-1
rows=0

Query 2:

q=*:*
fq=D_DepartureAirport:(2660) 
facet=true
facet.field=D_Destination
facet.limit=-1
rows=0

Query 3:

q=*:*
fq=D_DepartureAirport:(2661)
facet=true
facet.field=D_Destination
facet.limit=-1
rows=0

Query 4:

q=*:*
fq=D_DepartureAirport:(2660+OR+2661)
facet=true
facet.field=D_Destination
facet.limit=-1
rows=0

We've kept the field type as a string, as the value is mapped by application 
that accesses Solr. In the examples above, the values are mapped to airports 
and destinations.
Is it possible to prewarm the above queries without having to define all the 
potential filters manually in the auto warming?

At the moment, we update and optimise our index in a different environment and 
then copy the index to our production instances by using a rolling deployment 
in Kubernetes.

Kind Regards,

James Bodkin

On 12/06/2020, 18:58, "Erick Erickson" <erickerick...@gmail.com> wrote:

    I question whether fiterCache has anything to do with it, I suspect what’s 
really happening is that first time you’re reading the relevant bits from disk 
into memory. And to double check you should have docVaues enabled for all these 
fields. The “uninverting” process  can be very expensive, and docValues 
bypasses that.

    As of Solr 7.6, you can define “uninvertible=true” to your field(Type) to 
“fail fast” if Solr needs to uninvert the field.

    But that’s an aside. In either case, my claim is that first-time execution 
does “something”, either reads the serialized docValues from disk or uninverts 
the file on Solr’s heap.

    You can have this autowarmed by any combination of
    1> specifying an autowarm count on your queryResultCache. That’s hit or 
miss, as it replays the most recent N queries which may or may not contain the 
sorts. That said, specifying 10-20 for autowarm count is usually a good idea, 
assuming you’re not committing more than, say, every 30 seconds. I’d add the 
same to filterCache too.

    2> specifying a newSearcher or firstSearcher query in solrconfig.xml. The 
difference is that newSearcher is fired every time a commit happens, while 
firstSearcher is only fired when Solr starts, the theory being that there’s no 
cache autowarming available when Solr fist powers up. Usually, people don’t 
bother with firstSearcher or just make it the same as newSearcher. Note that a 
query doesn’t have to be “real” at all. You can just add all the facet fields 
to a *:* query in a single go.

    BTW, Trie fields will stay around for a long time even though deprecated. 
Or at least until we find something to replace them with that doesn’t have this 
penalty, so I’d feel pretty safe using those and they’ll be more efficient than 
strings.

    Best,
    Erick

Re: Facet Performance

Reply via email to