So is this new optimize maxSegments / commit expungeDeletes behavior in 7.5? My 
experience, and I watch the my optimize process very closely, is that using 
maxSgements does not touch every segment with a deleted document. 
expungeDeletes merges all segments that have deleted documents that have been 
touched with said commit.

After reading LUCENE-7976, it seems this is, indeed, new behavior.


On 6/7/19, 10:31 AM, "Erick Erickson" <erickerick...@gmail.com> wrote:

    Optimizing guarantees that there will be _no_ deleted documents in an index 
when done. If a segment has even one deleted document, it’s merged, no matter 
what you specify for maxSegments. 
    
    Segments are write-once, so to remove deleted data from a segment it must 
be at least rewritten into a new segment, whether or not it’s merged with 
another segment on optimize.
    
    expungeDeletes  does _not_ merge every segment that has deleted documents. 
It merges segments that have > 10% (the default) deleted documents. If your 
index happens to have all segments with > 10% deleted docs, then it will, 
indeed, merge all of them.
    
    In your example, if you look closely you should find that all segments that 
had any deleted documents were written (merged) to new segments. I’d expect 
that segments with _no_ deleted documents might mostly be left alone. And two 
of the segments were chosen to merge together.
    
    See LUCENE-7976 for a long discussion of how this changed starting  with 
SOLR 7.5.
    
    Best,
    Erick
    
    > On Jun 7, 2019, at 7:07 AM, David Santamauro <david.santama...@gmail.com> 
wrote:
    > 
    > Erick, on 6.0.1, optimize with maxSegments only merges down to the 
specified number. E.g., given an index with 75 segments, optimize with 
maxSegments=74 will only merge 2 segments leaving 74 segments. It will choose a 
segment to merge that has deleted documents, but does not merge every segment 
with deleted documents.
    > 
    > I think you are thinking about the expungeDeletes parameter on the commit 
request. That will merge every segment that has a deleted document.
    > 
    > 
    > On 6/7/19, 10:00 AM, "Erick Erickson" <erickerick...@gmail.com> wrote:
    > 
    >    This isn’t quite right. Solr will rewrite _all_ segments that have 
_any_ deleted documents in them when optimizing, even one. Given your 
description, I’d guess that all your segments will have deleted documents, so 
even if you do specify maxSegments on the optimize command, the entire index 
will be rewritten.
    > 
    >    You’re in a bind, see: 
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
 You have this one massive segment and it will _not_ be merged until it’s 
almost all deleted documents, see the link above for a fuller explanation.
    > 
    >    Prior to Solr 7.5 you don’t have many options except to re-index and 
_not_ optimize. So if possible I’d reindex from scratch into a new collection 
and do not optimize. Or restructure your process such that you can optimize in 
a quiet period when little indexing is going on.
    > 
    >    Best,
    >    Erick
    > 
    >> On Jun 7, 2019, at 2:51 AM, jena <sthita2...@gmail.com> wrote:
    >> 
    >> Thanks @Nicolas Franck for reply, i don't see any any segment info for 
4.4
    >> version. Is there any API i can use to get my segment information ? Will 
try
    >> to use maxSegments and see if it can help us during optimization.
    >> 
    >> 
    >> 
    >> --
    >> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
    > 
    > 
    
    

Reply via email to