Right. If I've multiplied right, you're essentially replacing your entire index
every day given the rate you're adding documents.

Have a look at MergePolicy, here are a couple of references:
http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/
https://lucene.apache.org/core/old_versioned_docs/versions/3_2_0/api/core/org/apache/lucene/index/MergePolicy.html
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

But unless you're having problems with performance, I'd consider just
optimizing once a day at off-peak hours.

FWIW,
Erick

On Fri, Oct 12, 2012 at 5:35 PM, Petersen, Robert <rober...@buy.com> wrote:
> Hi Erick,
>
> After reading the discussion you guys were having about renaming optimize to 
> forceMerge I realized I was guilty of over-optimizing like you guys were 
> worried about!  We have about 15 million docs indexed now and we spin about 
> 50-300 adds per second 24/7, most of them being updates to existing documents 
> whose data has changed since the last time it was indexed (which we keep 
> track of in a DB table).  There are some new documents being added in the mix 
> and some deletes as well too.
>
> I understand now how the merge policy caps the number of segments.  I used to 
> think they would grow unbounded and thus optimize was required.  How does the 
> large number of updates of existing documents affect the need to optimize, by 
> causing a large number of deletes with a 're-add'?  And so I suppose that 
> means the index size tends to grow with the deleted docs hanging around in 
> the background, as it were.
>
> So in our situation, what frequency of optimize would you recommend?  We're 
> on 3.6.1 btw...
>
> Thanks,
> Robi
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, October 11, 2012 5:29 AM
> To: solr-user@lucene.apache.org
> Subject: Re: anyone have any clues about this exception
>
> Well, you'll actually be able to optimize, it's just called forceMerge.
>
> But the point is that optimize seems like something that _of course_ you want 
> to do, when in reality it's not something you usually should do at all. 
> Optimize does two things:
> 1> merges all the segments into one (usually)
> 2> removes all of the info associated with deleted documents.
>
> Of the two, point <2> is the one that really counts and that's done whenever 
> segment merging is done anyway. So unless you have a very large number of 
> deletes (or updates of the same document), optimize buys you very little. You 
> can tell this by the difference between numDocs and maxDoc in the admin page.
>
> So what happens if you just don't bother to optimize? Take a look at merge 
> policy to help control how merging happens perhaps as an alternative.
>
> Best
> Erick
>
> On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert <rober...@buy.com> wrote:
>> You could be right.  Going back in the logs, I noticed it used to happen 
>> less frequently and always towards the end of an optimize operation.  It is 
>> probably my indexer timing out waiting for updates to occur during 
>> optimizes.  The errors grew recently due to my upping the indexer 
>> threadcount to 22 threads, so there's a lot more timeouts occurring now.  
>> Also our index has grown to double the old size so the optimize operation 
>> has started taking a lot longer, also contributing to what I'm seeing.   I 
>> have just changed my optimize frequency from three times a day to one time a 
>> day after reading the following:
>>
>> Here they are talking about completely deprecating the optimize
>> command in the next version of solr...
>> https://issues.apache.org/jira/browse/SOLR-3141c
>>
>>
>> -----Original Message-----
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: Wednesday, October 10, 2012 11:10 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: anyone have any clues about this exception
>>
>> Something timed out, the other end closed the connection. This end tried to 
>> write to closed pipe and died, something tried to catch that exception and 
>> write its own and died even worse? Just making it up really, but sounds good 
>> (plus a 3-year Java tech-support hunch).
>>
>> If it happens often enough, see if you can run WireShark on that machine's 
>> network interface and catch the whole network conversation in action. Often, 
>> there is enough clues there by looking at tcp packets and/or stuff 
>> transmitted. WireShark is a power-tool, so takes a little while the first 
>> time, but the learning will pay for itself over and over again.
>>
>> Regards,
>>    Alex.
>>
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert <rober...@buy.com> wrote:
>>> Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (master) 
>>> instance contains lots of these exceptions but solr itself seems to be 
>>> doing fine... any ideas?  I'm not seeing these exceptions being logged on 
>>> my slave servers btw, just the master where we do our indexing only.
>>>
>>>
>>>
>>> Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve
>>> invoke
>>> SEVERE: Servlet.service() for servlet default threw exception
>>> java.lang.IllegalStateException
>>>                 at 
>>> org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
>>>                 at 
>>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
>>>                 at 
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
>>>                 at 
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>                 at 
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>                 at 
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>                 at 
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>                 at 
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>>                 at 
>>> com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30)
>>>                 at 
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>                 at 
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>                 at 
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>>>                 at 
>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
>>>                 at 
>>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>>                 at 
>>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
>>>                 at java.lang.Thread.run(Unknown Source)
>>
>
>

Reply via email to