Hi All,

We have been facing extra load incidents due to higher gc count and gc time
causing higher response time and timeouts.

Solr Cloud Cluster Details

We use solr cloud v8.10 [with java 8 and G1 GC] with 8 shards where each
shard is present on a single vm of 16 cores and 50 gb RAM. Size of each
shard is ~28 gb and heap of solr is 16 gb [heap utilization only for
filter, document, and queryResults cache each of size 512].

Problem Details

We pause indexing at 11 AM during peak searching hours. Normally the system
remains stable during the peak hours, but when documents update count on
solr is higher before peak hours [b/w from 5.30 AM to 11 AM], we face
multiple load issues. The gc count and gc time increases and cpu is
consumed in gc itself thereby increasing load and response time of the
system. To mitigate this, we recently increased the ram on the servers [to
50 gb from 42 gb previously], as to reduce the io wait for writing solr
index on memory multiple times. Taking a step further, we also increased
the heap of solr from 12 to 16 gb [also tried other combinations like 14
gb, 15 gb, 18 gb], although we found some reduction in load issues due to
lower io wait, still the issue recurs when higher indexing is done.

We have explored a few options like expunge deletes, which may help reduce
the deleted documents percentage, but that cannot be executed close to peak
hours, as it increases io wait which further spikes load and response time
of solr significantly.


   1.

   Apart from changing the expunge deletes timing, is there another option
   which we can try to mitigate this problem?
   2.

   Approximately 60 million documents are updated each day i.e. ~30% of the
   complete solr index is modified each day while serving ~20 million search
   requests. Would appreciate any knowledge upon how to handle such high
   indexing + searching traffic during peak hours.

Reply via email to