Right. If I've multiplied right, you're essentially replacing your entire index every day given the rate you're adding documents.
Have a look at MergePolicy, here are a couple of references: http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/ https://lucene.apache.org/core/old_versioned_docs/versions/3_2_0/api/core/org/apache/lucene/index/MergePolicy.html http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html But unless you're having problems with performance, I'd consider just optimizing once a day at off-peak hours. FWIW, Erick On Fri, Oct 12, 2012 at 5:35 PM, Petersen, Robert <rober...@buy.com> wrote: > Hi Erick, > > After reading the discussion you guys were having about renaming optimize to > forceMerge I realized I was guilty of over-optimizing like you guys were > worried about! We have about 15 million docs indexed now and we spin about > 50-300 adds per second 24/7, most of them being updates to existing documents > whose data has changed since the last time it was indexed (which we keep > track of in a DB table). There are some new documents being added in the mix > and some deletes as well too. > > I understand now how the merge policy caps the number of segments. I used to > think they would grow unbounded and thus optimize was required. How does the > large number of updates of existing documents affect the need to optimize, by > causing a large number of deletes with a 're-add'? And so I suppose that > means the index size tends to grow with the deleted docs hanging around in > the background, as it were. > > So in our situation, what frequency of optimize would you recommend? We're > on 3.6.1 btw... > > Thanks, > Robi > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Thursday, October 11, 2012 5:29 AM > To: solr-user@lucene.apache.org > Subject: Re: anyone have any clues about this exception > > Well, you'll actually be able to optimize, it's just called forceMerge. > > But the point is that optimize seems like something that _of course_ you want > to do, when in reality it's not something you usually should do at all. > Optimize does two things: > 1> merges all the segments into one (usually) > 2> removes all of the info associated with deleted documents. > > Of the two, point <2> is the one that really counts and that's done whenever > segment merging is done anyway. So unless you have a very large number of > deletes (or updates of the same document), optimize buys you very little. You > can tell this by the difference between numDocs and maxDoc in the admin page. > > So what happens if you just don't bother to optimize? Take a look at merge > policy to help control how merging happens perhaps as an alternative. > > Best > Erick > > On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert <rober...@buy.com> wrote: >> You could be right. Going back in the logs, I noticed it used to happen >> less frequently and always towards the end of an optimize operation. It is >> probably my indexer timing out waiting for updates to occur during >> optimizes. The errors grew recently due to my upping the indexer >> threadcount to 22 threads, so there's a lot more timeouts occurring now. >> Also our index has grown to double the old size so the optimize operation >> has started taking a lot longer, also contributing to what I'm seeing. I >> have just changed my optimize frequency from three times a day to one time a >> day after reading the following: >> >> Here they are talking about completely deprecating the optimize >> command in the next version of solr... >> https://issues.apache.org/jira/browse/SOLR-3141c >> >> >> -----Original Message----- >> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] >> Sent: Wednesday, October 10, 2012 11:10 AM >> To: solr-user@lucene.apache.org >> Subject: Re: anyone have any clues about this exception >> >> Something timed out, the other end closed the connection. This end tried to >> write to closed pipe and died, something tried to catch that exception and >> write its own and died even worse? Just making it up really, but sounds good >> (plus a 3-year Java tech-support hunch). >> >> If it happens often enough, see if you can run WireShark on that machine's >> network interface and catch the whole network conversation in action. Often, >> there is enough clues there by looking at tcp packets and/or stuff >> transmitted. WireShark is a power-tool, so takes a little while the first >> time, but the learning will pay for itself over and over again. >> >> Regards, >> Alex. >> >> Personal blog: http://blog.outerthoughts.com/ >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >> - Time is the quality of nature that keeps events from happening all >> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD >> book) >> >> >> On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert <rober...@buy.com> wrote: >>> Tomcat localhost log (not the catalina log) for my solr 3.6.1 (master) >>> instance contains lots of these exceptions but solr itself seems to be >>> doing fine... any ideas? I'm not seeing these exceptions being logged on >>> my slave servers btw, just the master where we do our indexing only. >>> >>> >>> >>> Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve >>> invoke >>> SEVERE: Servlet.service() for servlet default threw exception >>> java.lang.IllegalStateException >>> at >>> org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389) >>> at >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) >>> at >>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) >>> at >>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) >>> at >>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) >>> at >>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) >>> at >>> com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30) >>> at >>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) >>> at >>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) >>> at >>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) >>> at >>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) >>> at >>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) >>> at >>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) >>> at java.lang.Thread.run(Unknown Source) >> > >