Re: Solr Timeouts

Mark Miller Tue, 06 Oct 2009 13:45:29 -0700

It sounds like he is indexing on a local disk, but reading the files to
be index from NFS - which would be fine.


You can get Lucene indexes to work on NFS (though still not recommended)
, but you need to use a custom IndexDeletionPolicy to keep older commit
points around longer and be sure not to use NIOFSDirectory.

Feak, Todd wrote:
> I seem to recall hearing something about *not* putting a Solr index directory 
> on an NFS mount. Might want to search on that.
>
> That, of course, doesn't have anything to do with commits showing up 
> unexpectedly in stack traces, per your original email.
>
> -Todd
>
> -----Original Message-----
> From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]
> Sent: Tuesday, October 06, 2009 12:39 PM
> To: solr-user@lucene.apache.org; yo...@lucidimagination.com
> Subject: RE: Solr Timeouts
>
> That thread was blocking for an hour while all other threads were idle or 
> blocked.
>
> -----Original Message-----
> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
> Sent: Tuesday, October 06, 2009 3:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Timeouts
>
> This specific thread was blocked for an hour?
> If so, I'd echo Lance... this is a local disk right?
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Mon, Oct 5, 2009 at 2:11 PM, Giovanni Fernandez-Kincade
> <gfernandez-kinc...@capitaliq.com> wrote:
>   
>> I just grabbed another stack trace for a thread that has been similarly 
>> blocking for over an hour. Notice that there is no Commit in this one:
>>
>> http-8080-Processor67 [RUNNABLE] CPU time: 1:02:05
>> org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
>> org.apache.lucene.index.SegmentTermEnum.next()
>> org.apache.lucene.index.SegmentTermEnum.scanTo(Term)
>> org.apache.lucene.index.TermInfosReader.get(Term, boolean)
>> org.apache.lucene.index.TermInfosReader.get(Term)
>> org.apache.lucene.index.SegmentTermDocs.seek(Term)
>> org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int)
>> org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos)
>> org.apache.lucene.index.IndexWriter.applyDeletes()
>> org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean)
>> org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean)
>> org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean)
>> org.apache.lucene.index.IndexWriter.updateDocument(Term, Document, Analyzer)
>> org.apache.lucene.index.IndexWriter.updateDocument(Term, Document)
>> org.apache.solr.update.DirectUpdateHandler2.addDoc(AddUpdateCommand)
>> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(AddUpdateCommand)
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(SolrContentHandler,
>>  AddUpdateCommand)
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(SolrContentHandler)
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(SolrQueryRequest,
>>  SolrQueryResponse, ContentStream)
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
>>  SolrQueryResponse)
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, 
>> SolrQueryResponse)
>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest,
>>  SolrQueryResponse)
>> org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, 
>> SolrQueryResponse)
>> org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, 
>> SolrRequestHandler, SolrQueryRequest, SolrQueryResponse)
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, 
>> ServletResponse, FilterChain)
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest,
>>  ServletResponse)
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, 
>> ServletResponse)
>> org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response)
>> org.apache.catalina.core.StandardContextValve.invoke(Request, Response)
>> org.apache.catalina.core.StandardHostValve.invoke(Request, Response)
>> org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response)
>> org.apache.catalina.core.StandardEngineValve.invoke(Request, Response)
>> org.apache.catalina.connector.CoyoteAdapter.service(Request, Response)
>> org.apache.coyote.http11.Http11Processor.process(InputStream, OutputStream)
>> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection,
>>  Object[])
>> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket, 
>> TcpConnection, Object[])
>> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[])
>> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
>> java.lang.Thread.run()
>>
>>
>> -----Original Message-----
>> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
>> Sent: Monday, October 05, 2009 1:18 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Timeouts
>>
>> OK... next step is to verify that SolrCell doesn't have a bug that
>> causes it to commit.
>> I'll try and verify today unless someone else beats me to it.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> On Mon, Oct 5, 2009 at 1:04 PM, Giovanni Fernandez-Kincade
>> <gfernandez-kinc...@capitaliq.com> wrote:
>>     
>>> I'm fairly certain that all of the indexing jobs are calling SOLR with 
>>> commit=false. They all construct the indexing URLs using a CLR function I 
>>> wrote, which takes in a Commit parameter, which is always set to false.
>>>
>>> Also, I don't see any calls to commit in the Tomcat logs (whereas normally 
>>> when I make a commit call I do).
>>>
>>> This suggests that Solr is doing it automatically, but the extract handler 
>>> doesn't seem to be the problem:
>>>  <requestHandler name="/update/extract" 
>>> class="org.apache.solr.handler.extraction.ExtractingRequestHandler" 
>>> startup="lazy">
>>>    <lst name="defaults">
>>>      <str name="uprefix">ignored_</str>
>>>      <str name="map.content">fileData</str>
>>>    </lst>
>>>  </requestHandler>
>>>
>>>
>>> There is no external config file specified, and I don't see anything about 
>>> commits here.
>>>
>>> I've tried setting up more detailed indexer logging but haven't been able 
>>> to get it to work:
>>> <infoStream file="c:\solr\indexer.log">true</infoStream>
>>>
>>> I tried relative and absolute paths, but no dice so far.
>>>
>>> Any other ideas?
>>>
>>> -Gio.
>>>
>>> -----Original Message-----
>>> From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
>>> Sent: Monday, October 05, 2009 12:52 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr Timeouts
>>>
>>>       
>>>> This is what one of my SOLR requests look like:
>>>>
>>>> http://titans:8080/solr/update/extract/?literal.versionId=684936&literal.filingDate=1997-12-04T00:00:00Z&literal.formTypeId=95&literal.companyId=3567904&literal.sourceId=0&resource.name=684936.txt&commit=false
>>>>         
>>> Have you verified that all of your indexing jobs (you said you had 4
>>> or 5) have commit=false?
>>>
>>> Also make sure that your extract handler doesn't have a default of
>>> something that could cause a commit - like commitWithin or something.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>> On Mon, Oct 5, 2009 at 12:44 PM, Giovanni Fernandez-Kincade
>>> <gfernandez-kinc...@capitaliq.com> wrote:
>>>       
>>>> Is there somewhere other than solrConfig.xml that the autoCommit feature 
>>>> is enabled? I've looked through that file and found autocommit to be 
>>>> commented out:
>>>>
>>>>
>>>>
>>>> <!--
>>>>
>>>>  Perform a <commit/> automatically under certain conditions:
>>>>
>>>>         maxDocs - number of updates since last commit is greater than this
>>>>
>>>>         maxTime - oldest uncommited update (in ms) is this long ago
>>>>
>>>>    <autoCommit>
>>>>
>>>>      <maxDocs>10000</maxDocs>
>>>>
>>>>      <maxTime>1000</maxTime>
>>>>
>>>>    </autoCommit>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  -->
>>>>
>>>>
>>>>
>>>>         
>>>>
>>>> -----Original Message-----
>>>> From: Feak, Todd [mailto:todd.f...@smss.sony.com]
>>>> Sent: Monday, October 05, 2009 12:40 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: RE: Solr Timeouts
>>>>
>>>>
>>>>
>>>> Actually, ignore my other response.
>>>>
>>>>
>>>>
>>>> I believe you are committing, whether you know it or not.
>>>>
>>>>
>>>>
>>>> This is in your provided stack trace
>>>>
>>>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
>>>>  SolrParams, boolean) 
>>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
>>>>  SolrQueryResponse)
>>>>
>>>>
>>>>
>>>> I think Yonik gave you additional information for how to make it faster.
>>>>
>>>>
>>>>
>>>> -Todd
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>>
>>>> From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]
>>>>
>>>> Sent: Monday, October 05, 2009 9:30 AM
>>>>
>>>> To: solr-user@lucene.apache.org
>>>>
>>>> Subject: RE: Solr Timeouts
>>>>
>>>>
>>>>
>>>> I'm not committing at all actually - I'm waiting for all 6 million to be 
>>>> done.
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>>
>>>> From: Feak, Todd [mailto:todd.f...@smss.sony.com]
>>>>
>>>> Sent: Monday, October 05, 2009 12:10 PM
>>>>
>>>> To: solr-user@lucene.apache.org
>>>>
>>>> Subject: RE: Solr Timeouts
>>>>
>>>>
>>>>
>>>> How often are you committing?
>>>>
>>>>
>>>>
>>>> Every time you commit, Solr will close the old index and open the new one. 
>>>> If you are doing this in parallel from multiple jobs (4-5 you mention) 
>>>> then eventually the server gets behind and you start to pile up commit 
>>>> requests. Once this starts to happen, it will cascade out of control if 
>>>> the rate of commits isn't slowed.
>>>>
>>>>
>>>>
>>>> -Todd
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>> From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]
>>>>
>>>> Sent: Monday, October 05, 2009 9:04 AM
>>>>
>>>> To: solr-user@lucene.apache.org
>>>>
>>>> Subject: Solr Timeouts
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I'm attempting to index approximately 6 million HTML/Text files using SOLR 
>>>> 1.4/Tomcat6 on Windows Server 2003 x64. I'm running 64 bit Tomcat and JVM. 
>>>> I've fired up 4-5 different jobs that are making indexing requests using 
>>>> the ExtractionRequestHandler, and everything works well for about 30-40 
>>>> minutes, after which all indexing requests start timing out. I profiled 
>>>> the server and found that all of the threads are getting blocked by this 
>>>> call to flush the Lucene index to disk (see below).
>>>>
>>>>
>>>>
>>>> This leads me to a few questions:
>>>>
>>>>
>>>>
>>>> 1.       Is this normal?
>>>>
>>>>
>>>>
>>>> 2.       Can I reduce the frequency with which this happens somehow? I've 
>>>> greatly increased the indexing options in SolrConfig.xml (attached here) 
>>>> to no avail.
>>>>
>>>>
>>>>
>>>> 3.       During these flushes, resource utilization (CPU, I/O, Memory 
>>>> Consumption) is significantly down compared to when requests are being 
>>>> handled. Is there any way to make this index go faster? I have plenty of 
>>>> bandwidth on the machine.
>>>>
>>>>
>>>>
>>>> I appreciate any insight you can provide. We're currently using MS SQL 
>>>> 2005 as our full-text solution and are pretty much miserable. So far SOLR 
>>>> has been a great experience.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Gio.
>>>>
>>>>
>>>>
>>>> http-8080-Processor21 [RUNNABLE] CPU time: 9:51
>>>>
>>>> java.io.RandomAccessFile.seek(long)
>>>>
>>>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
>>>>  int, int)
>>>>
>>>> org.apache.lucene.store.BufferedIndexInput.refill()
>>>>
>>>> org.apache.lucene.store.BufferedIndexInput.readByte()
>>>>
>>>> org.apache.lucene.store.IndexInput.readVInt()
>>>>
>>>> org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
>>>>
>>>> org.apache.lucene.index.SegmentTermEnum.next()
>>>>
>>>> org.apache.lucene.index.SegmentTermEnum.scanTo(Term)
>>>>
>>>> org.apache.lucene.index.TermInfosReader.get(Term, boolean)
>>>>
>>>> org.apache.lucene.index.TermInfosReader.get(Term)
>>>>
>>>> org.apache.lucene.index.SegmentTermDocs.seek(Term)
>>>>
>>>> org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int)
>>>>
>>>> org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos)
>>>>
>>>> org.apache.lucene.index.IndexWriter.applyDeletes()
>>>>
>>>> org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean)
>>>>
>>>> org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean)
>>>>
>>>> org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean)
>>>>
>>>> org.apache.lucene.index.IndexWriter.closeInternal(boolean)
>>>>
>>>> org.apache.lucene.index.IndexWriter.close(boolean)
>>>>
>>>> org.apache.lucene.index.IndexWriter.close()
>>>>
>>>> org.apache.solr.update.SolrIndexWriter.close()
>>>>
>>>> org.apache.solr.update.DirectUpdateHandler2.closeWriter()
>>>>
>>>> org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand)
>>>>
>>>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand)
>>>>
>>>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
>>>>  SolrParams, boolean)
>>>>
>>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
>>>>  SolrQueryResponse)
>>>>
>>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, 
>>>> SolrQueryResponse)
>>>>
>>>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest,
>>>>  SolrQueryResponse)
>>>>
>>>> org.apache.solr.core.SolrCore.execute(SolrRequestHandler, 
>>>> SolrQueryRequest, SolrQueryResponse)
>>>>
>>>> org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, 
>>>> SolrRequestHandler, SolrQueryRequest, SolrQueryResponse)
>>>>
>>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, 
>>>> ServletResponse, FilterChain)
>>>>
>>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest,
>>>>  ServletResponse)
>>>>
>>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, 
>>>> ServletResponse)
>>>>
>>>> org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response)
>>>>
>>>> org.apache.catalina.core.StandardContextValve.invoke(Request, Response)
>>>>
>>>> org.apache.catalina.core.StandardHostValve.invoke(Request, Response)
>>>>
>>>> org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response)
>>>>
>>>> org.apache.catalina.core.StandardEngineValve.invoke(Request, Response)
>>>>
>>>> org.apache.catalina.connector.CoyoteAdapter.service(Request, Response)
>>>>
>>>> org.apache.coyote.http11.Http11Processor.process(InputStream, OutputStream)
>>>>
>>>> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection,
>>>>  Object[])
>>>>
>>>> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket, 
>>>> TcpConnection, Object[])
>>>>
>>>> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[])
>>>>
>>>> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
>>>>
>>>> java.lang.Thread.run()
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>         
>
>
>   


-- 
- Mark

http://www.lucidimagination.com

Re: Solr Timeouts

Reply via email to