Have you considered TLOG/PULL replicas rather than NRT replicas? 
That way, all the indexing happens on a single machine and you can
use shards.preference to confine the searches happen on the PULL replicas,
see:  https://lucene.apache.org/solr/guide/7_7/distributed-requests.html

No, you can’t really limit the number of segments. While that seems like a
good idea, it quickly becomes counter-productive. Say you require that you
have 10 segments. Say each one becomes 10G. What happens when the 11th
segment is created and it’s 100M? Do you rewrite one of the 10G segments just
to add 100M? Your problem gets worse, not better.


Best,
Erick

> On Jun 5, 2020, at 1:41 AM, Anshuman Singh <singhanshuma...@gmail.com> wrote:
> 
> Hi Nicolas,
> 
> Commit happens automatically at 100k documents. We don't commit explicitly.
> We didn't limit the number of segments. There are 35+ segments in each core.
> But unrelated to the question, I would like to know if we can limit the
> number of segments in the core. I tried it in the past but the merge
> policies don't allow that.
> The TieredMergePolicy has two parameters, maxMergeAtOnce and
> segmentsPerTier. It seems like we cannot control the total number of
> segments but only the segments per tier.(
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
> )
> 
> 
> On Thu, Jun 4, 2020 at 5:48 PM Nicolas Franck <nicolas.fra...@ugent.be>
> wrote:
> 
>> The real questions are:
>> 
>> * how much often do you commit (either explicitly or automatically)?
>> * how much segments do you allow? If you only allow 1 segment,
>>  then that whole segment is recreated using the old documents and the
>> updates.
>>  And yes, that requires reading the old segment.
>>  It is common to allow multiple segments when you update often,
>>  so updating does not interfere with reading the index too often.
>> 
>> 
>>> On 4 Jun 2020, at 14:08, Anshuman Singh <singhanshuma...@gmail.com>
>> wrote:
>>> 
>>> I noticed that while indexing, when commit happens, there is high disk
>> read
>>> by Solr. The problem is that it is impacting search performance when the
>>> index is loaded from the disk with respect to the query, as the disk read
>>> speed is not quite good and the whole index is not cached in RAM.
>>> 
>>> When no searching is performed, I noticed that disk is usually read
>> during
>>> commit operations and sometimes even without commit at low rate. I guess
>> it
>>> is read due to segment merge operations. Can it be something else?
>>> If it is merging, can we limit disk IO during merging?
>> 
>> 

Reply via email to