First, there's absolutely no reason to optimize this often, if at all. Older
versions of Lucene would search faster on an optimized index, but
this is no longer necessary. Optimize will reclaim data from
deleted documents, but is generally recommended to be performed
fairly rarely, often at off-peak hours.

Note that optimize will re-write your entire index into a single new segment,
so following your pattern it'll take longer and longer each time.

But the speed change happening at 500,000 documents is suspiciously
close to the default mergeFactor of 10 X 50,000. Do subsequent
optimizes (i.e. on the 750,000th document) still take that long? But
this doesn't make sense because if you're optimizing instead of
committing, each optimize should reduce your index to 1 segment and
you'll never hit a merge.

So I'm a little confused. If you're really optimizing every 50K docs, what
I'd expect to see is successively longer times, and at the end of each
optimize I'd expect there to be only one segment in your index.

Are you sure you're not just seeing successively longer times on each
optimize and just noticing it after 10?

Best
Erick

On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque <sbazer...@gmail.com> wrote:
> Hello!
>
> Here is a puzzling experiment:
>
> I build an index of about 1.2MM documents using SOLR 3.1. The index has a
> large number of dynamic fields (about 15.000). Each document has about 100
> fields.
>
> I add the documents in batches of 20, and every 50.000 documents I optimize
> the index.
>
> The first 10 optimizes (up to exactly 500k documents) take less than a
> minute and a half.
>
> But the 11th and all subsequent commits take north of 10 minutes. The commit
> logs look identical (in the INFOSTREAM.txt file), but what used to be
>
>   Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene Merge
> Thread #0]: merge: total 500000 docs
>
> Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene Merge
> Thread #0]: merge store matchedCount=2 vs 2
>
>
> now eats a lot of time:
>
>
>   Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene Merge
> Thread #0]: merge: total 550000 docs
>
> Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene Merge
> Thread #0]: merge store matchedCount=2 vs 2
>
>
> What could be happening between those two lines that takes 10 minutes at
> full CPU? (and with 50k docs less used to take so much less?).
>
>
> Thanks in advance,
>
> Santiago
>

Reply via email to