On 8/23/2018 3:41 AM, zhenyuan wei wrote:
Thank you very much to answer. @Jan Høydahl
My query is simple, just wildcard last 2 char in this query(have more other
query to optimize)
curl "
http://emr-worker-1:8983/solr/collection005/query?q=v10_s:OOOOOOOOVVVVVVVVYY*&rows=10&&fl=id&echoParams=all
"
I think that's the answer right there -- wildcard query. Wildcard
queries have a tendency to be slow, because of how they work. What is
the nature of your v10_s field? Does that wildcard query match a lot of
terms? When a wildcard query executes, Solr asks the index for all
terms that match it, and then constructs a query with all of those terms
in it. If there are ten million terms that match the wildcard, the
query will *quite literally* have ten million entries inside it. Every
one of the terms will need to be separately searched against the index.
Each term will be fast, but it adds up if there are a lot of them. This
query had a numFound larger than one hundred thousand. Which suggests
that there were at least that many terms in the query. So basically in
the time it took, Solr first gathered a huge list of terms, and then
internally executed over one hundred thousand individual queries.
Changing your field definition so you can avoid wildcard queries will go
a long way towards speeding things up.Typically this involves some kind
of ngram tokenizer or filter. It will make the index much larger, but
tends to speed things up.
Your example says the QTime is 125 milliseconds, and your message talks
about times of 40 milliseconds. This is NOT slow. If you're trying to
maximize queries per second, you need to know that handling a high query
load requires multiple servers handling multiple replicas of your index,
and some kind of load balancing.
Configuring caches cannot speed up the first time a query runs. That
speeds up later runs. To speed up the first time will require two things:
1) Ensuring that there is enough memory in the system for the operating
system to effectively cache the index. This is memory *beyond* the java
heap that is not allocated to any program.
2) Changing the query to a type that executes faster and adjusting the
schema to allow the new type to work. Wildcard queries are one of the
worst options.
In a later message, you indicated that your cache autowarmCount values
are mostly set to zero. This means that anytime you make a change to
the index, your caches are completely gone, and that the one cache with
a nonzero setting is using NoOpRegenerator, so it's not actually doing
any warming. With auto warming, the most recent entries in the cache
will be re-executed to warm the new caches. This can help with
performance, but if you make autoWarmCount too large, it will make
commits take a very long time. Note that documentCache actually doesn't
do warming, so that setting is irrelevant on that cache.
Thanks,
Shawn