David:

Some of this still matters even with 7.5+. Prior to 7.5, you could easily have 
50% of your index consist of deleted docs. With 7.5, this ceiling is reduced. 
expungeDeletes will reduce the size to no more than 10% while still respecting 
the default max segment size of 5G. Optimizing and specifying maxSegments was 
getting you what you wanted, but more as a side effect, ya’ got lucky ;)….

You can set a bunch of parameters explicitly for TieredMergePolicy, some of the 
juicy ones might be 

- maxMergedSegmentMB, default 5000, will result in fewer segments but doesn’t 
materially affect the ration of deleted docs.

-forceMergeDeletesPctAllowed (used in expungeDeletes, default 10%)

- deletesPctAllowed (when doing “regular” merging, i.e. not optimizing or 
expungeDeletes) this is the target ceiling for the % of deleted docs allowed in 
the index. Cannot set below 20%).


It’s a balance between I/O and wasted space. The reason deletesPctAllowed is 
not allowed to go below 20% is that it’s too easy to shoot yourself in the 
foot. Setting it to 5%, for instance, would send I/O (and CPU) through the 
roof, merging is an expensive operation. And you can get something similar by 
doing an expungeDeletes once rather than rewriting segments all the time….

Ditto with the default value for forceMergeDeletesPctAllowed. Setting it to 1%, 
for instance, is doing a LOT of work for little gain.

Best,
Erick


> On Jun 7, 2019, at 2:44 PM, David Santamauro <david.santama...@gmail.com> 
> wrote:
> 
> I use the same algorithm and for me, initialMaxSegments is always the number 
> of segments currently in the index (seen, e.g, in the SOLR admin UI). 
> finalMaxSegments depends on what kind of updates have happened. If I know 
> that "older" documents are untouched, then I'll usually use -60% or even 
> -70%, depending on the initialMaxSegments. I have a few cores that I'll even 
> go all the way down to 1.
> 
> If you are going to attempt this, I'd suggest to test with a small reduction, 
> say 10 segments, and monitor the index size and difference between maxDoc and 
> numDocs. I've shaved ~ 1T off of an index optimizing from 75 down to  30 
> segments (7T index total) and reduced a significant % of delete documents in 
> the process. YMMV ...
> 
> If you are using a version of SOLR >=7.5 (see LUCENE-7976), this might all be 
> moot.
> 
> //
> 
> 
> On 6/7/19, 2:29 PM, "jena" <sthita2...@gmail.com> wrote:
> 
>    Thanks @Michael Joyner,  how did you decide initialmax segment to 256 ? Or 
> it
>    is some random number i can use for my case ? Can you guuide me how to
>    decide the initial & final max segments ?
> 
> 
>    Michael Joyner wrote
>> That is the way we do it here - also helps a lot with not needing x2 or 
>> x3 disk space to handle the merge:
>> 
>> public void solrOptimize() {
>>         int initialMaxSegments = 256;
>>         int finalMaxSegments = 4;
>>         if (isShowSegmentCounter()) {
>>             log.info("Optimizing ...");
>>         }
>>         try (SolrClient solrServerInstance = getSolrClientInstance()) {
>>             for (int segments = initialMaxSegments; segments >= 
>> finalMaxSegments; segments--) {
>>                 if (isShowSegmentCounter()) {
>>                     System.out.println("Optimizing to a max of " + 
>> segments + " segments.");
>>                 }
>>                 try {
>>                     solrServerInstance.optimize(true, true, segments);
>>                 } catch (RemoteSolrException | SolrServerException | 
>> IOException e) {
>>                     log.severe(e.getMessage());
>>                 }
>>             }
>>         } catch (IOException e) {
>>             throw new RuntimeException(e);
>>         }
>>     }
>> 
>> On 6/7/19 4:56 AM, Nicolas Franck wrote:
>>> In that case, hard optimisation like that is out the question.
>>> Resort to automatic merge policies, specifying a maximum
>>> amount of segments. Solr is created with multiple segments
>>> in mind. Hard optimisation seems like not worth the problem.
>>> 
>>> The problem is this: the less segments you specify during
>>> during an optimisation, the longer it will take, because it has to read
>>> all of these segments to be merged, and redo the sorting. And a cluster
>>> has a lot of housekeeping on top of it.
>>> 
>>> If you really want to issue a optimisation, then you can
>>> also do it in steps (max segments parameter)
>>> 
>>> 10 -> 9 -> 8 -> 7 .. -> 1
>>> 
>>> that way less segments need to be merged in one go.
>>> 
>>> testing your index will show you what a good maximum
>>> amount of segments is for your index.
>>> 
>>>> On 7 Jun 2019, at 07:27, jena &lt;
> 
>> sthita2010@
> 
>> &gt; wrote:
>>>> 
>>>> Hello guys,
>>>> 
>>>> We have 4 solr(version 4.4) instance on production environment, which
>>>> are
>>>> linked/associated with zookeeper for replication. We do heavy deleted &
>>>> add
>>>> operations. We have around 26million records and the index size is
>>>> around
>>>> 70GB. We serve 100k+ requests per day.
>>>> 
>>>> 
>>>> Because of heavy indexing & deletion, we optimise solr instance
>>>> everyday,
>>>> because of that our solr cloud getting unstable , every solr instance go
>>>> on
>>>> recovery mode & our search is getting affected & very slow because of
>>>> that.
>>>> Optimisation takes around 1hr 30minutes.
>>>> We are not able fix this issue, please help.
>>>> 
>>>> Thanks & Regards
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 
> 
> 
> 
> 
>    --
>    Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> 

Reply via email to