Use tlog+pull replicas, they will improve the situation significantly

чт, 20 июн. 2024 г., 07:27 Saksham Gupta
<saksham.gu...@indiamart.com.invalid>:

> Hi All,
>
> We have been facing extra load incidents due to higher gc count and gc time
> causing higher response time and timeouts.
>
> Solr Cloud Cluster Details
>
> We use solr cloud v8.10 [with java 8 and G1 GC] with 8 shards where each
> shard is present on a single vm of 16 cores and 50 gb RAM. Size of each
> shard is ~28 gb and heap of solr is 16 gb [heap utilization only for
> filter, document, and queryResults cache each of size 512].
>
> Problem Details
>
> We pause indexing at 11 AM during peak searching hours. Normally the system
> remains stable during the peak hours, but when documents update count on
> solr is higher before peak hours [b/w from 5.30 AM to 11 AM], we face
> multiple load issues. The gc count and gc time increases and cpu is
> consumed in gc itself thereby increasing load and response time of the
> system. To mitigate this, we recently increased the ram on the servers [to
> 50 gb from 42 gb previously], as to reduce the io wait for writing solr
> index on memory multiple times. Taking a step further, we also increased
> the heap of solr from 12 to 16 gb [also tried other combinations like 14
> gb, 15 gb, 18 gb], although we found some reduction in load issues due to
> lower io wait, still the issue recurs when higher indexing is done.
>
> We have explored a few options like expunge deletes, which may help reduce
> the deleted documents percentage, but that cannot be executed close to peak
> hours, as it increases io wait which further spikes load and response time
> of solr significantly.
>
>
>    1.
>
>    Apart from changing the expunge deletes timing, is there another option
>    which we can try to mitigate this problem?
>    2.
>
>    Approximately 60 million documents are updated each day i.e. ~30% of the
>    complete solr index is modified each day while serving ~20 million
> search
>    requests. Would appreciate any knowledge upon how to handle such high
>    indexing + searching traffic during peak hours.
>

Reply via email to