I use the same algorithm and for me, initialMaxSegments is always the number of 
segments currently in the index (seen, e.g, in the SOLR admin UI). 
finalMaxSegments depends on what kind of updates have happened. If I know that 
"older" documents are untouched, then I'll usually use -60% or even -70%, 
depending on the initialMaxSegments. I have a few cores that I'll even go all 
the way down to 1.

If you are going to attempt this, I'd suggest to test with a small reduction, 
say 10 segments, and monitor the index size and difference between maxDoc and 
numDocs. I've shaved ~ 1T off of an index optimizing from 75 down to  30 
segments (7T index total) and reduced a significant % of delete documents in 
the process. YMMV ...

If you are using a version of SOLR >=7.5 (see LUCENE-7976), this might all be 
moot.

//


On 6/7/19, 2:29 PM, "jena" <sthita2...@gmail.com> wrote:

    Thanks @Michael Joyner,  how did you decide initialmax segment to 256 ? Or 
it
    is some random number i can use for my case ? Can you guuide me how to
    decide the initial & final max segments ?
    
     
    Michael Joyner wrote
    > That is the way we do it here - also helps a lot with not needing x2 or 
    > x3 disk space to handle the merge:
    > 
    > public void solrOptimize() {
    >          int initialMaxSegments = 256;
    >          int finalMaxSegments = 4;
    >          if (isShowSegmentCounter()) {
    >              log.info("Optimizing ...");
    >          }
    >          try (SolrClient solrServerInstance = getSolrClientInstance()) {
    >              for (int segments = initialMaxSegments; segments >= 
    > finalMaxSegments; segments--) {
    >                  if (isShowSegmentCounter()) {
    >                      System.out.println("Optimizing to a max of " + 
    > segments + " segments.");
    >                  }
    >                  try {
    >                      solrServerInstance.optimize(true, true, segments);
    >                  } catch (RemoteSolrException | SolrServerException | 
    > IOException e) {
    >                      log.severe(e.getMessage());
    >                  }
    >              }
    >          } catch (IOException e) {
    >              throw new RuntimeException(e);
    >          }
    >      }
    > 
    > On 6/7/19 4:56 AM, Nicolas Franck wrote:
    >> In that case, hard optimisation like that is out the question.
    >> Resort to automatic merge policies, specifying a maximum
    >> amount of segments. Solr is created with multiple segments
    >> in mind. Hard optimisation seems like not worth the problem.
    >>
    >> The problem is this: the less segments you specify during
    >> during an optimisation, the longer it will take, because it has to read
    >> all of these segments to be merged, and redo the sorting. And a cluster
    >> has a lot of housekeeping on top of it.
    >>
    >> If you really want to issue a optimisation, then you can
    >> also do it in steps (max segments parameter)
    >>
    >> 10 -> 9 -> 8 -> 7 .. -> 1
    >>
    >> that way less segments need to be merged in one go.
    >>
    >> testing your index will show you what a good maximum
    >> amount of segments is for your index.
    >>
    >>> On 7 Jun 2019, at 07:27, jena &lt;
    
    > sthita2010@
    
    > &gt; wrote:
    >>>
    >>> Hello guys,
    >>>
    >>> We have 4 solr(version 4.4) instance on production environment, which
    >>> are
    >>> linked/associated with zookeeper for replication. We do heavy deleted &
    >>> add
    >>> operations. We have around 26million records and the index size is
    >>> around
    >>> 70GB. We serve 100k+ requests per day.
    >>>
    >>>
    >>> Because of heavy indexing & deletion, we optimise solr instance
    >>> everyday,
    >>> because of that our solr cloud getting unstable , every solr instance go
    >>> on
    >>> recovery mode & our search is getting affected & very slow because of
    >>> that.
    >>> Optimisation takes around 1hr 30minutes.
    >>> We are not able fix this issue, please help.
    >>>
    >>> Thanks & Regards
    >>>
    >>>
    >>>
    >>> --
    >>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
    
    
    
    
    
    --
    Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
    

Reply via email to