On 11/2/2018 1:38 PM, Chuming Chen wrote:
I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g 
-Xmx40g”), each shard has 32 million documents and 32Gbytes in size.

A 40GB heap is probably completely unnecessary for an index of that size.  Does each machine have one replica on it or two? If you are trying for high availability, then it will be at least two shard replicas per machine.

The values on -Xms and -Xmx should normally be set the same.  Java will always tend to allocate the entire max heap it has been allowed, so it's usually better to just let it have the whole amount right up front.

For a given query (I use complexphrase query), typically, the first time it 
took a couple of seconds to return the first 20 docs. However, for the 
following page, or sorting by a field, even run the same query again took a lot 
longer to return results. I can see my 4 solr nodes running crazy with more 
than 100%CPU.

Can you obtain a screenshot of a process listing as described at the following URL, and provide the image using a file sharing site?

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

There are separate instructions there for Windows and for Linux/UNIX operating systems.

Also useful are the GC logs that are written by Java when Solr is started using the included scripts.  I'm looking for logfiles that cover several days of runtime.  You'll need to share them with a file sharing website -- files will not normally make it to the mailing list if attached to a message.

Getting a copy of the solrconfig.xml in use on your collection can also be helpful.

My understanding is that Solr has query cache, run same query should be faster.

If the query is absolutely identical in *every* way, then yes, it can be satisfied from Solr caches, if their size is sufficient.  If you change ANYTHING, including things like rows or start, filters, sorting, facets, and other parameters, then the query probably cannot be satisfied completely from cache.  At that point, Solr is very reliant on how much memory has NOT been allocated to programs -- it must be a sufficient quantity of memory that the Solr index data can be effectively cached.

What could be wrong here? How do I debug? I checked solr.log in all nodes and 
didn’t see anything unusual. Most frequent log entry looks like this.

INFO  - 2018-11-02 19:32:55.189; [   ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null 
path=/admin/metrics 
params={wt=javabin&version=2&key=solr.core.patternmatch.shard3.replica_n8:UPDATE./update.requests&key=solr.core.patternmatch.shard3.replica_n8:INDEX.sizeInBytes&key=solr.core.patternmatch.shard1.replica_n1:QUERY./select.requests&key=solr.core.patternmatch.shard1.replica_n1:INDEX.sizeInBytes&key=solr.core.patternmatch.shard1.replica_n1:UPDATE./update.requests&key=solr.core.patternmatch.shard3.replica_n8:QUERY./select.requests}
 status=0 QTime=7
INFO  - 2018-11-02 19:32:55.192; [   ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null 
path=/admin/metrics 
params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used}
 status=0 QTime=1

That is not a query.  It is a call to the Metrics API. When I've made this call on a production Solr machine, it seems to be very resource-intensive, taking a long time.  I don't think it should be made frequently.  Probably no more than once a minute. If you are seeing that kind of entry in your logs a lot, then that might be contributing to your performance issues.

Thanks,
Shawn

Reply via email to