In general, do not optimize unless you
1> have a very static index
2> actually test the search performance afterwards.

First, as Andrew says, optimizing will force a complete
copy of the entire index at replication. If you do NOT
optimize, only the most recent segments to be written
are copied.

Second, unless you have a quite large number of
segments, optimizing despite its cool-sounding name,
doesn't buy you much. In fact there's a JIRA to
rename it to something less good-sounding precisely
because people think "of course I want the index
optimizied".

Third, under no circumstances should you optimize
after every update. This will absolutely kill your
indexing. Optimizing copies all segments into
a single segment. In other words you'll spend a lot
of time copying junk around for no good reason. Here
I'm assuming by "update" you mean after every batch
of documents is added. If you're talking after an entire
indexing run, it's not so bad.

Fourth, one tangible result of optimizing is that the
index is purged of all deleted documents (and remember
that a document update is really a delete followed by
an add). But the same thing happens on segment
merges, which happen without optimizing.

Bottom line: Don't bother to optimize unless and until
you demonstrate that optimizing provides enough of a
performance boost to be worth it. Even then re-check
your assumptions. Look at the various merge policies
to have more control over when merges occur and
the number of segments you have, but try to forget
that optimization even exists <G>....

Best
Erick


There's some good info here...
http://wiki.apache.org/solr/SolrPerformanceFactors

Best
Erick

On Mon, Jan 23, 2012 at 12:22 AM, Andrew Harvey <and...@mootpointer.com> wrote:
> We found that optimising too often killed our slave performance. An optimise 
> will cause you to merge and ship the whole index rather than just the 
> relevant portions when you replicate.
>
> The change on our slaves in terms of IO and CPU as well as RAM was marked.
>
> Andrew
>
> Sent on the run.
>
> On 23/01/2012, at 19:03, Maxim Veksler <ma...@vekslers.org> wrote:
>
>> I'm planning on having 1 Master and multiple slaves (cloud based, slaves
>> are going up / down randomly).
>>
>> The slaves should be constantly available, meaning searching performance
>> should optimally not be affected by the updates at all.
>> It's unclear to me how the Cluster based replication works, does it copy
>> the files from the master and updates in place? In which case am I correct
>> to assume that except for cache being emptied the search performance in not
>> affects?
>>
>> Does optimize on the master some how affects the performance of the slaves?
>> Is it recommended to run optimize after each update, assuming I'm not
>> concerted about locking the master for updates and it's OK if the optimize
>> finishes in under 20min?
>>
>> Thank you,
>> Maxim.

Reply via email to