Re: NRT Merge Load on NAS SDD (Cloud) Advice

Karl Stoney Wed, 07 Apr 2021 11:04:06 -0700

Hello Jan!
Thanks for your reply.

    - Number of nodes and sepc of each node


9 nodes each 24vCPU 70GB ram, 28GB heap rest for OS.  Disks are 768GB SSDs now 
(going from 512 -> 768 made little difference really).

    - Nuber of shards and replicas

1 shard, 9 replicas, eg each node holds all data (but subsequently indexes all 
data too).  The logic being combined with 
shards.preference=replica.location:local nodes wouldn't need to proxy reads.

    - Number of docs totally and per shard

1.6 million @ 20gb (without deleted docs)

    - Update rate, and how you do commits?

Update rates very throughout the day, but range from 20ops/sec to 300ops/sec.  
Commits are done using autoCommit on 1 min interval, softCommit on 15min 
interval.


On 05/04/2021, 14:13, "Jan Høydahl" <[email protected]> wrote:

    Karl,

    Your screen shots were lost at my end, perhaps they did not make it to the 
list. You may consider sharing graphics through an external service?
    What do you mean by "heavy write"? Can you quantify your cluster in terms 
of e.g.
    - Number of nodes and sepc of each node
    - Nuber of shards and replicas
    - Number of docs totally and per shard
    - Update rate, and how you do commits?

    Jan

    > 1. apr. 2021 kl. 13:43 skrev Karl Stoney 
<[email protected]>:
    >
    > Hi all.
    > I’m looking for some opinions on how to best configure the Merges to run 
optimally on GCP SSD’s (network attached).  For context; we have a 9 node NRT 
8.8.1 Solr Cloud cluster, each node has an index which is between 25 and 35gb 
in size, depending on the current merge state / deleted docs.  The index is 
both heavy write, and heavy read, so we’re always typically merging (which is 
somewhat fine).
    >
    > Now the SSD’s that we have are 512gb, and on GCP they scale with #cpus 
and ram amount.  The disk we have are therefore rated for:
    >
    > Sustained read IOPS 15k
    > Sustained write IOPS 15k
    > Sustained read throughput 250mb/s
    > Sustained write throughput 250mb/s
    >
    > Both read and write can be sustained in parallel at the peak.
    >
    > Now what we observe, as you can see from this graph is that we typically 
have a mean write throughput of 16-20mbs (way below our peak), but we’re also 
peaking at above 250, which is causing us to get write throttled:
    >
    >
    >
    > So really what I believe (if possible) we need is a configuration that is 
less “bursty”, but more sustained over perhaps a longer duration.  As they are 
network attached disk, they suffer from initial iops latency, but sustained 
throughput is high.
    >
    > I’ve graphed the merge statistics out here, as you can see at any given 
time we have a maximum of 3 concurrent minor merges running, with the 
occasional major.  P95 on the minor is typically around 2 minutes, but 
occasionally (correlating with a throttle on the above graphs) we can see a 
minor merge taking 12->15mins.
    >
    >
    >
    > Our index policy looks like this:
    >
    >     <ramBufferSizeMB>512</ramBufferSizeMB>
    >     <mergePolicyFactory 
class="org.apache.solr.index.TieredMergePolicyFactory">
    >       <int name="maxMergeAtOnce">10</int>
    >       <int name="segmentsPerTier">10</int>
    >       <int name="maxMergedSegmentMB">5000</int>
    >       <int name="deletesPctAllowed">30</int>
    >     </mergePolicyFactory>
    >     <mergeScheduler 
class="org.apache.lucene.index.ConcurrentMergeScheduler">
    >       <int name="maxThreadCount">10</int>
    >       <int name="maxMergeCount">15</int>
    >       <bool name="ioThrottle">true</bool>
    >     </mergeScheduler>
    >     <mergedSegmentWarmer 
class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/>
    >
    > I feel like I’d be guessing which of these settings may help the scenario 
I describe above, which is somewhat fine – I can experiment and measure.  But 
the feedback loop is relatively slow so I wanted to lean on others 
experience/input first.  My instinct is to perhaps lower `maxThreadCount`, but 
seeing as we only ever peak at 3 in progress merges, it feels like I’d have to 
go low (2, or even 1) which is on par with spindle disks, which these aren’t.
    >
    > Thanks in advance for any help
    >
    > Unless expressly stated otherwise in this email, this e-mail is sent on 
behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, 
Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto 
Trader Limited is part of the Auto Trader Group Plc group. This email and any 
files transmitted with it are confidential and may be legally privileged, and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.


Unless expressly stated otherwise in this email, this e-mail is sent on behalf 
of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, 
Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited 
is part of the Auto Trader Group Plc group. This email and any files 
transmitted with it are confidential and may be legally privileged, and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.

Re: NRT Merge Load on NAS SDD (Cloud) Advice

Reply via email to