Hello,

The attachments did not come through (of top and iotop). Also please post
GC configurations and the results of GC pauses?

Thanks.


Deepak
"The greatness of a nation can be judged by the way its animals are treated
- Mahatma Gandhi"

+91 73500 12833
deic...@gmail.com

LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Sun, Jun 23, 2024 at 2:41 PM Saksham Gupta
<saksham.gu...@indiamart.com.invalid> wrote:

> Hi All,
>
> Thanks for all the engagemant.
>
> Use tlog+pull replicas, they will improve the situation significantly: Do
> you mean using tlog/ pull replicas to serve search requests and a separate
> set of replicas for indexing? We have tried this in the past, but either
> requires a separate set of infra[which will double the cost] or all the
> traffic gets redirected onto the nrt replicas if any issue is faced on the
> pull/ tlog replicas.
>
> Can you please tell me about the hardware details (Server type, CPU speed
> and type, Disk Speed and type) and GC configuration? Also please post
> results of top, iotop if you can?
>
> *CPU Details:*
>
> model name: Intel(R) Xeon(R) CPU @ 2.80GHz
>
> cpu MHz         : 2800.198
>
> *Disk speed:* The VMs used are GCP n2 machines with 16 core CPU and 50
> gigs of RAM [n2 machine with nCPU >=16 has SSD of maximum throughput of
> 1200 mbps, Source
> <https://cloud.google.com/compute/docs/disks/performance#pd-ssd>]
>
>
> *Top/Itop Result: *Please find files attached with load and io wait stats
> for a solr server which faced this issue on 11 June, 2024, from 11 AM to 2
> PM where load was consistently higher than usual and indexing was also
> higher than usual that day.
>
> For IO wait stats: io-wait-details-solr-node-11-June-2024
>
> For Load stats: load-details-solr-node-11-June-2024
>
> Are you having iowait, gc pauses, or something else? Do you commit often
> or in one big batch?
>
> The iowait is <0.05.
>
> Commit is configured in such a way that autoSoftCommit runs once every 1
> hour, and autoCommit runs every 30 min [with openSearcher=false]. Please
> refer to the io wait stats for the duration when load exists. No
> correlation with io wait found!
>
> On Thu, Jun 20, 2024 at 5:15 PM matthew sporleder <msporle...@gmail.com>
> wrote:
>
>> Are you having iowait, gc pauses, or something else? Do you commit often
>> or in one big batch?
>>
>> > On Jun 20, 2024, at 12:26 AM, Saksham Gupta <
>> saksham.gu...@indiamart.com.invalid> wrote:
>> >
>> > Hi All,
>> >
>> > We have been facing extra load incidents due to higher gc count and gc
>> time
>> > causing higher response time and timeouts.
>> >
>> > Solr Cloud Cluster Details
>> >
>> > We use solr cloud v8.10 [with java 8 and G1 GC] with 8 shards where each
>> > shard is present on a single vm of 16 cores and 50 gb RAM. Size of each
>> > shard is ~28 gb and heap of solr is 16 gb [heap utilization only for
>> > filter, document, and queryResults cache each of size 512].
>> >
>> > Problem Details
>> >
>> > We pause indexing at 11 AM during peak searching hours. Normally the
>> system
>> > remains stable during the peak hours, but when documents update count on
>> > solr is higher before peak hours [b/w from 5.30 AM to 11 AM], we face
>> > multiple load issues. The gc count and gc time increases and cpu is
>> > consumed in gc itself thereby increasing load and response time of the
>> > system. To mitigate this, we recently increased the ram on the servers
>> [to
>> > 50 gb from 42 gb previously], as to reduce the io wait for writing solr
>> > index on memory multiple times. Taking a step further, we also increased
>> > the heap of solr from 12 to 16 gb [also tried other combinations like 14
>> > gb, 15 gb, 18 gb], although we found some reduction in load issues due
>> to
>> > lower io wait, still the issue recurs when higher indexing is done.
>> >
>> > We have explored a few options like expunge deletes, which may help
>> reduce
>> > the deleted documents percentage, but that cannot be executed close to
>> peak
>> > hours, as it increases io wait which further spikes load and response
>> time
>> > of solr significantly.
>> >
>> >
>> >   1.
>> >
>> >   Apart from changing the expunge deletes timing, is there another
>> option
>> >   which we can try to mitigate this problem?
>> >   2.
>> >
>> >   Approximately 60 million documents are updated each day i.e. ~30% of
>> the
>> >   complete solr index is modified each day while serving ~20 million
>> search
>> >   requests. Would appreciate any knowledge upon how to handle such high
>> >   indexing + searching traffic during peak hours.
>>
>

Reply via email to