On 8/23/2018 3:41 AM, zhenyuan wei wrote:
Thank you very much to answer.  @Jan Høydahl
My query is simple, just wildcard last 2 char in this query(have more other
query to optimize)

  curl "
http://emr-worker-1:8983/solr/collection005/query?q=v10_s:OOOOOOOOVVVVVVVVYY*&rows=10&&fl=id&echoParams=all
"

I think that's the answer right there -- wildcard query. Wildcard queries have a tendency to be slow, because of how they work.  What is the nature of your v10_s field?  Does that wildcard query match a lot of terms?  When a wildcard query executes, Solr asks the index for all terms that match it, and then constructs a query with all of those terms in it.  If there are ten million terms that match the wildcard, the query will *quite literally* have ten million entries inside it.  Every one of the terms will need to be separately searched against the index.  Each term will be fast, but it adds up if there are a lot of them.  This query had a numFound larger than one hundred thousand.  Which suggests that there were at least that many terms in the query. So basically in the time it took, Solr first gathered a huge list of terms, and then internally executed over one hundred thousand individual queries.

Changing your field definition so you can avoid wildcard queries will go a long way towards speeding things up.Typically this involves some kind of ngram tokenizer or filter. It will make the index much larger, but tends to speed things up.

Your example says the QTime is 125 milliseconds, and your message talks about times of 40 milliseconds.  This is NOT slow. If you're trying to maximize queries per second, you need to know that handling a high query load requires multiple servers handling multiple replicas of your index, and some kind of load balancing.

Configuring caches cannot speed up the first time a query runs.  That speeds up later runs.  To speed up the first time will require two things:

1) Ensuring that there is enough memory in the system for the operating system to effectively cache the index.  This is memory *beyond* the java heap that is not allocated to any program. 2) Changing the query to a type that executes faster and adjusting the schema to allow the new type to work.  Wildcard queries are one of the worst options.

In a later message, you indicated that your cache autowarmCount values are mostly set to zero.  This means that anytime you make a change to the index, your caches are completely gone, and that the one cache with a nonzero setting is using NoOpRegenerator, so it's not actually doing any warming.  With auto warming, the most recent entries in the cache will be re-executed to warm the new caches.  This can help with performance, but if you make autoWarmCount too large, it will make commits take a very long time.  Note that documentCache actually doesn't do warming, so that setting is irrelevant on that cache.

Thanks,
Shawn

Reply via email to