On 11/2/2018 1:38 PM, Chuming Chen wrote:
I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g
-Xmx40g”), each shard has 32 million documents and 32Gbytes in size.
A 40GB heap is probably completely unnecessary for an index of that
size. Does each machine have one replica on it or two? If you are
trying for high availability, then it will be at least two shard
replicas per machine.
The values on -Xms and -Xmx should normally be set the same. Java will
always tend to allocate the entire max heap it has been allowed, so it's
usually better to just let it have the whole amount right up front.
For a given query (I use complexphrase query), typically, the first time it
took a couple of seconds to return the first 20 docs. However, for the
following page, or sorting by a field, even run the same query again took a lot
longer to return results. I can see my 4 solr nodes running crazy with more
than 100%CPU.
Can you obtain a screenshot of a process listing as described at the
following URL, and provide the image using a file sharing site?
https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
There are separate instructions there for Windows and for Linux/UNIX
operating systems.
Also useful are the GC logs that are written by Java when Solr is
started using the included scripts. I'm looking for logfiles that cover
several days of runtime. You'll need to share them with a file sharing
website -- files will not normally make it to the mailing list if
attached to a message.
Getting a copy of the solrconfig.xml in use on your collection can also
be helpful.
My understanding is that Solr has query cache, run same query should be faster.
If the query is absolutely identical in *every* way, then yes, it can be
satisfied from Solr caches, if their size is sufficient. If you change
ANYTHING, including things like rows or start, filters, sorting, facets,
and other parameters, then the query probably cannot be satisfied
completely from cache. At that point, Solr is very reliant on how much
memory has NOT been allocated to programs -- it must be a sufficient
quantity of memory that the Solr index data can be effectively cached.
What could be wrong here? How do I debug? I checked solr.log in all nodes and
didn’t see anything unusual. Most frequent log entry looks like this.
INFO - 2018-11-02 19:32:55.189; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
path=/admin/metrics
params={wt=javabin&version=2&key=solr.core.patternmatch.shard3.replica_n8:UPDATE./update.requests&key=solr.core.patternmatch.shard3.replica_n8:INDEX.sizeInBytes&key=solr.core.patternmatch.shard1.replica_n1:QUERY./select.requests&key=solr.core.patternmatch.shard1.replica_n1:INDEX.sizeInBytes&key=solr.core.patternmatch.shard1.replica_n1:UPDATE./update.requests&key=solr.core.patternmatch.shard3.replica_n8:QUERY./select.requests}
status=0 QTime=7
INFO - 2018-11-02 19:32:55.192; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null
path=/admin/metrics
params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used}
status=0 QTime=1
That is not a query. It is a call to the Metrics API. When I've made
this call on a production Solr machine, it seems to be very
resource-intensive, taking a long time. I don't think it should be made
frequently. Probably no more than once a minute. If you are seeing that
kind of entry in your logs a lot, then that might be contributing to
your performance issues.
Thanks,
Shawn