Hi All, We have been facing extra load incidents due to higher gc count and gc time causing higher response time and timeouts.
Solr Cloud Cluster Details We use solr cloud v8.10 [with java 8 and G1 GC] with 8 shards where each shard is present on a single vm of 16 cores and 50 gb RAM. Size of each shard is ~28 gb and heap of solr is 16 gb [heap utilization only for filter, document, and queryResults cache each of size 512]. Problem Details We pause indexing at 11 AM during peak searching hours. Normally the system remains stable during the peak hours, but when documents update count on solr is higher before peak hours [b/w from 5.30 AM to 11 AM], we face multiple load issues. The gc count and gc time increases and cpu is consumed in gc itself thereby increasing load and response time of the system. To mitigate this, we recently increased the ram on the servers [to 50 gb from 42 gb previously], as to reduce the io wait for writing solr index on memory multiple times. Taking a step further, we also increased the heap of solr from 12 to 16 gb [also tried other combinations like 14 gb, 15 gb, 18 gb], although we found some reduction in load issues due to lower io wait, still the issue recurs when higher indexing is done. We have explored a few options like expunge deletes, which may help reduce the deleted documents percentage, but that cannot be executed close to peak hours, as it increases io wait which further spikes load and response time of solr significantly. 1. Apart from changing the expunge deletes timing, is there another option which we can try to mitigate this problem? 2. Approximately 60 million documents are updated each day i.e. ~30% of the complete solr index is modified each day while serving ~20 million search requests. Would appreciate any knowledge upon how to handle such high indexing + searching traffic during peak hours.