And, terms whose documents have been deleted are not purged. So, you
can merge all you like and the index will not shrink back completely.
Only an optimize will remove the "orphan" terms.

This is important because the orphan terms affect relevance
calculations. So you really want to purge them with an optimize. You
can do limited optimize passes with the 'maxSegments' option to the
optimize command.

http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22_and_.22optimize.22

On Fri, Nov 20, 2009 at 11:37 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Nov 20, 2009 at 2:32 PM, Michael <solrco...@gmail.com> wrote:
>> On Fri, Nov 20, 2009 at 12:35 PM, Yonik Seeley
>> <yo...@lucidimagination.com> wrote:
>>> On Fri, Nov 20, 2009 at 12:24 PM, Michael <solrco...@gmail.com> wrote:
>>>> So -- I thought I understood you to mean that if I frequently merge,
>>>> it's basically the same as an optimize, and cruft will get purged.  Am
>>>> I misunderstanding you?
>>>
>>> That only applies to the segments involved in the merge.  The deleted
>>> documents are left behind when old segments are merged into a new
>>> segment.
>>
>> Your statement is leading me to believe that I have misunderstood the
>> merge process.  I thought that every time there are 10 segments, they
>> get merged down to 1.  Therefore, every time a merge happens, every
>> single segment in my entire index is "involved in the merge".  9
>> segments later, we're back to 10 segments, and they're merged into 1.
>> 9 segments later, we're back to 10 segments once again, and they're
>> merged into 1.
>>
>> Maybe I have misunderstood the mergeFactor docs.  Maybe instead it's like 
>> this?
>> 1. Segment A1 fills with N docs, and a new segment A2 is created.
>> 2. A2 fills with N docs, and A3 is created; A3 fills with N docs, etc.
>> 3. A9 fills with N docs, and merging occurs: Segment B1 is created
>> with 10*N docs, segments A1-A9 are deleted.
>> 4. A new segment A1 fills with N docs, and a new segment A2 is
>> created; B1 is still sitting with 10*N docs.
>> 5. Eventually A1 through A9 each have N docs, and then merging occurs:
>> Segment B2 is created, with 10*N docs.
>> 6. Eventually Segments B1 through B9 each have 10*N docs, and merging
>> occurs: Segment C1 is created, with 100*N docs.  Segments B1-B9 are
>> deleted.
>> 7. A new A1 starts filling again.
>>
>> Some time down the line I might have 4 D segments with 1000*N docs
>> each, 6 C segments with 100*N docs each, 8 B segments with 10*N docs
>> each, 2 A segments with N docs each, and an open A3 segment filling
>> up.
>>
>> If this is right, then your statement above means that yes, each merge
>> of many As into 1 B purges all the deleted docs in A1-A9, but All my
>> Ds, Cs, and Bs aren't updated to purge deleted docs yet.  Only when
>> B1-B9 merge into a new C do their deleted docs get purged; only when
>> C1-C9 merge into a new D do their deleted docs get purged; etc.
>>
>> Is this right?  Sorry it was so verbose!
>
> Yep, that's right.
>
> -Yonik
> http://www.lucidimagination.com
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to