> On Jun 7, 2019, at 7:53 AM, David Santamauro <david.santama...@gmail.com> 
> wrote:
> 
> So is this new optimize maxSegments / commit expungeDeletes behavior in 7.5? 
> My experience, and I watch the my optimize process very closely, is that 
> using maxSgements does not touch every segment with a deleted document. 
> expungeDeletes merges all segments that have deleted documents that have been 
> touched with said commit.
> 

Which part? 

The  different thing about 7.5 is that an optimize that doesn’t specify 
maxSegments will remove all deleted docs from an index without creating massive 
segments. Prior to 7.5 a simple optimize would create a single segment by 
default, no matter how large.

If, after the end of an optimize on a quiescent index, you see a difference 
between maxDoc and numDocs (or  deletedDocs  > 0) for a core, then that’s 
entirely unexpected  for any version of Solr.  NOTE: If you are actively 
indexing while optimizing you may see deleted docs in your index after optimize 
since optimize works on the segments it sees when the operation starts….

ExpungeDeletes has always, IIUC, defaulted to only merging segments  with > 10% 
deleted docs.

Best,
Erick

> After reading LUCENE-7976, it seems this is, indeed, new behavior.
> 
> 
> On 6/7/19, 10:31 AM, "Erick Erickson" <erickerick...@gmail.com> wrote:
> 
>    Optimizing guarantees that there will be _no_ deleted documents in an 
> index when done. If a segment has even one deleted document, it’s merged, no 
> matter what you specify for maxSegments. 
> 
>    Segments are write-once, so to remove deleted data from a segment it must 
> be at least rewritten into a new segment, whether or not it’s merged with 
> another segment on optimize.
> 
>    expungeDeletes  does _not_ merge every segment that has deleted documents. 
> It merges segments that have > 10% (the default) deleted documents. If your 
> index happens to have all segments with > 10% deleted docs, then it will, 
> indeed, merge all of them.
> 
>    In your example, if you look closely you should find that all segments 
> that had any deleted documents were written (merged) to new segments. I’d 
> expect that segments with _no_ deleted documents might mostly be left alone. 
> And two of the segments were chosen to merge together.
> 
>    See LUCENE-7976 for a long discussion of how this changed starting  with 
> SOLR 7.5.
> 
>    Best,
>    Erick
> 
>> On Jun 7, 2019, at 7:07 AM, David Santamauro <david.santama...@gmail.com> 
>> wrote:
>> 
>> Erick, on 6.0.1, optimize with maxSegments only merges down to the specified 
>> number. E.g., given an index with 75 segments, optimize with maxSegments=74 
>> will only merge 2 segments leaving 74 segments. It will choose a segment to 
>> merge that has deleted documents, but does not merge every segment with 
>> deleted documents.
>> 
>> I think you are thinking about the expungeDeletes parameter on the commit 
>> request. That will merge every segment that has a deleted document.
>> 
>> 
>> On 6/7/19, 10:00 AM, "Erick Erickson" <erickerick...@gmail.com> wrote:
>> 
>>   This isn’t quite right. Solr will rewrite _all_ segments that have _any_ 
>> deleted documents in them when optimizing, even one. Given your description, 
>> I’d guess that all your segments will have deleted documents, so even if you 
>> do specify maxSegments on the optimize command, the entire index will be 
>> rewritten.
>> 
>>   You’re in a bind, see: 
>> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
>>  You have this one massive segment and it will _not_ be merged until it’s 
>> almost all deleted documents, see the link above for a fuller explanation.
>> 
>>   Prior to Solr 7.5 you don’t have many options except to re-index and _not_ 
>> optimize. So if possible I’d reindex from scratch into a new collection and 
>> do not optimize. Or restructure your process such that you can optimize in a 
>> quiet period when little indexing is going on.
>> 
>>   Best,
>>   Erick
>> 
>>> On Jun 7, 2019, at 2:51 AM, jena <sthita2...@gmail.com> wrote:
>>> 
>>> Thanks @Nicolas Franck for reply, i don't see any any segment info for 4.4
>>> version. Is there any API i can use to get my segment information ? Will try
>>> to use maxSegments and see if it can help us during optimization.
>>> 
>>> 
>>> 
>>> --
>>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>> 
>> 
> 
> 

Reply via email to