Firstly;

This may not be a Solr related problem. Did you check the log file of Solr?
Tika mayhave some circumstances at some kind of situations. For example
when parsing HTML that has a base64 encoded image it may have some
problems. If you find the correct logs you can detect it. On the other take
care of Manifold, there may be some problem too.

17 Eylül 2013 Salı tarihinde Yossi Nachum <nachum...@gmail.com> adlı
kullanıcı şöyle yazdı:
> Hi,
>
> I am trying to index my windows pc files with manifoldcf version 1.3 and
> solr version 4.4.
>
> I create output connection and repository connection and started a new job
> that scan my E drive.
>
> Everything seems like it work ok but after a few minutes solr stop getting
> new files to index. I am seeing that through tomcat log file.
>
> On manifold crawler ui I see that the job is still running but after few
> minutes I am getting the following error:
> "Error: Repeated service interruptions - failure processing document: Read
> timed out"
>
> I am seeing that tomcat process is constantly consume 100% of one cpu (I
> have two cpu's) even after I get the error message from manifolfcf crawler
> ui.
>
> I check the thread dump in solr admin and saw that the following threads
> take the most cpu/user time
> "
> http-8080-3 (32)
>
>    - java.io.FileInputStream.readBytes(Native Method)
>    - java.io.FileInputStream.read(FileInputStream.java:236)
>    - java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>    - java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>    - java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>    - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
>    - java.io.FilterInputStream.read(FilterInputStream.java:133)
>    - org.apache.tika.io.TailStream.read(TailStream.java:117)
>    - org.apache.tika.io.TailStream.skip(TailStream.java:140)
>    - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
>    - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
>    -
>
 org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
>    - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
>    -
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>    -
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>    -
>
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>    -
>
 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
>    -
>
 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>    -
>
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>    -
>
 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
>    - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
>    -
>
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
>    -
>
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
>    -
>
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
>    -
>
 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>    -
>
 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>    -
>
 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>    -
>
 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>    -
>
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>    -
>
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>    -
>
 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>    -
>
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>    -
>
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
>    -
>
 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>    -
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>    - java.lang.Thread.run(Thread.java:679)
>
> "
>
> does anyone know what can I do? how to debug this issue? how can I check
> which file cause tika to work so hard?
> I don't see anything in the log files and I am stuck
> Thanks,
> Yossi
>

Reply via email to