Wonderful input. Thank you very much Erick. One question, I've been told that Solr supports an operation mode of multi core where you build the index on the master (optimize or not) then pass it to the "stand by" core on the slaves. Once the synchronization is complete you switch on the slave between the active and passive core (an operation that is claimed to be atomic, and can happen at run time). Have you or other members of this list had experience with this mode of operation?
Thank you. On Mon, Jan 23, 2012 at 7:25 PM, Erick Erickson <erickerick...@gmail.com>wrote: > In general, do not optimize unless you > 1> have a very static index > 2> actually test the search performance afterwards. > > First, as Andrew says, optimizing will force a complete > copy of the entire index at replication. If you do NOT > optimize, only the most recent segments to be written > are copied. > > Second, unless you have a quite large number of > segments, optimizing despite its cool-sounding name, > doesn't buy you much. In fact there's a JIRA to > rename it to something less good-sounding precisely > because people think "of course I want the index > optimizied". > > Third, under no circumstances should you optimize > after every update. This will absolutely kill your > indexing. Optimizing copies all segments into > a single segment. In other words you'll spend a lot > of time copying junk around for no good reason. Here > I'm assuming by "update" you mean after every batch > of documents is added. If you're talking after an entire > indexing run, it's not so bad. > > Fourth, one tangible result of optimizing is that the > index is purged of all deleted documents (and remember > that a document update is really a delete followed by > an add). But the same thing happens on segment > merges, which happen without optimizing. > > Bottom line: Don't bother to optimize unless and until > you demonstrate that optimizing provides enough of a > performance boost to be worth it. Even then re-check > your assumptions. Look at the various merge policies > to have more control over when merges occur and > the number of segments you have, but try to forget > that optimization even exists <G>.... > > Best > Erick > > > There's some good info here... > http://wiki.apache.org/solr/SolrPerformanceFactors > > Best > Erick > > On Mon, Jan 23, 2012 at 12:22 AM, Andrew Harvey <and...@mootpointer.com> > wrote: > > We found that optimising too often killed our slave performance. An > optimise will cause you to merge and ship the whole index rather than just > the relevant portions when you replicate. > > > > The change on our slaves in terms of IO and CPU as well as RAM was > marked. > > > > Andrew > > > > Sent on the run. > > > > On 23/01/2012, at 19:03, Maxim Veksler <ma...@vekslers.org> wrote: > > > >> I'm planning on having 1 Master and multiple slaves (cloud based, slaves > >> are going up / down randomly). > >> > >> The slaves should be constantly available, meaning searching performance > >> should optimally not be affected by the updates at all. > >> It's unclear to me how the Cluster based replication works, does it copy > >> the files from the master and updates in place? In which case am I > correct > >> to assume that except for cache being emptied the search performance in > not > >> affects? > >> > >> Does optimize on the master some how affects the performance of the > slaves? > >> Is it recommended to run optimize after each update, assuming I'm not > >> concerted about locking the master for updates and it's OK if the > optimize > >> finishes in under 20min? > >> > >> Thank you, > >> Maxim. >