In general, do not optimize unless you 1> have a very static index 2> actually test the search performance afterwards.
First, as Andrew says, optimizing will force a complete copy of the entire index at replication. If you do NOT optimize, only the most recent segments to be written are copied. Second, unless you have a quite large number of segments, optimizing despite its cool-sounding name, doesn't buy you much. In fact there's a JIRA to rename it to something less good-sounding precisely because people think "of course I want the index optimizied". Third, under no circumstances should you optimize after every update. This will absolutely kill your indexing. Optimizing copies all segments into a single segment. In other words you'll spend a lot of time copying junk around for no good reason. Here I'm assuming by "update" you mean after every batch of documents is added. If you're talking after an entire indexing run, it's not so bad. Fourth, one tangible result of optimizing is that the index is purged of all deleted documents (and remember that a document update is really a delete followed by an add). But the same thing happens on segment merges, which happen without optimizing. Bottom line: Don't bother to optimize unless and until you demonstrate that optimizing provides enough of a performance boost to be worth it. Even then re-check your assumptions. Look at the various merge policies to have more control over when merges occur and the number of segments you have, but try to forget that optimization even exists <G>.... Best Erick There's some good info here... http://wiki.apache.org/solr/SolrPerformanceFactors Best Erick On Mon, Jan 23, 2012 at 12:22 AM, Andrew Harvey <and...@mootpointer.com> wrote: > We found that optimising too often killed our slave performance. An optimise > will cause you to merge and ship the whole index rather than just the > relevant portions when you replicate. > > The change on our slaves in terms of IO and CPU as well as RAM was marked. > > Andrew > > Sent on the run. > > On 23/01/2012, at 19:03, Maxim Veksler <ma...@vekslers.org> wrote: > >> I'm planning on having 1 Master and multiple slaves (cloud based, slaves >> are going up / down randomly). >> >> The slaves should be constantly available, meaning searching performance >> should optimally not be affected by the updates at all. >> It's unclear to me how the Cluster based replication works, does it copy >> the files from the master and updates in place? In which case am I correct >> to assume that except for cache being emptied the search performance in not >> affects? >> >> Does optimize on the master some how affects the performance of the slaves? >> Is it recommended to run optimize after each update, assuming I'm not >> concerted about locking the master for updates and it's OK if the optimize >> finishes in under 20min? >> >> Thank you, >> Maxim.