I would second what Erlend said.

If you nevertheless want to index mp3's, I'd bring this up on the Solr or
Tika boards.

Karl



On Mon, Sep 16, 2013 at 5:15 AM, Erlend Garåsen <[email protected]>wrote:

>
> It seems that Tika is involved and tries to parse large files, i.e. MP3s.
>
> Do you really need to index such files? If not, try to filter them out by
> adding a rule in the "exclude from crawl" field for the configured job.
>
> Erlend
>
>
> On 9/16/13 7:13 AM, Yossi Nachum wrote:
>
>> Hi,
>>
>> I am trying to index my windows pc files with manifoldcf version 1.3 and
>> solr version 4.4.
>>
>> I create output connection and repository connection and started a new
>> job that scan my E drive.
>>
>> Everything seems like it work ok but after a few minutes solr stop
>> getting new, I am seeing that through tomcat log file.
>>
>> On manifold crawler ui I see that the job is still running but after few
>> minutes I am getting the following error:
>> "Error: Repeated service interruptions - failure processing document:
>> Server at 
>> http://localhost:8080/solr/**collection1<http://localhost:8080/solr/collection1>returned
>>  non ok
>> status:500, message:Internal Server Error"
>>
>> I am seeing that tomcat process is constantly consume 100% of one cpu (I
>> have two cpu's) even after I get the error message from manifolfcf
>> crawler ui.
>>
>> I check the thread dump in solr admin and saw that the following threads
>> take the most cpu/user time
>> "
>> http-8080-3 (32)
>>
>>   * java.io.FileInputStream.**readBytes(Native Method)
>>   * java.io.FileInputStream.read(**FileInputStream.java:236)
>>   * java.io.BufferedInputStream.**fill(BufferedInputStream.java:**235)
>>   * java.io.BufferedInputStream.**read1(BufferedInputStream.**java:275)
>>   * java.io.BufferedInputStream.**read(BufferedInputStream.java:**334)
>>   * org.apache.tika.io.**ProxyInputStream.read(**
>> ProxyInputStream.java:99)
>>   * java.io.FilterInputStream.**read(FilterInputStream.java:**133)
>>   * org.apache.tika.io.TailStream.**read(TailStream.java:117)
>>   * org.apache.tika.io.TailStream.**skip(TailStream.java:140)
>>   * org.apache.tika.parser.mp3.**MpegStream.skipStream(**
>> MpegStream.java:283)
>>   * org.apache.tika.parser.mp3.**MpegStream.skipFrame(**
>> MpegStream.java:160)
>>   * org.apache.tika.parser.mp3.**Mp3Parser.getAllTagHandlers(**
>> Mp3Parser.java:193)
>>   * org.apache.tika.parser.mp3.**Mp3Parser.parse(Mp3Parser.**java:71)
>>   * org.apache.tika.parser.**CompositeParser.parse(**
>> CompositeParser.java:242)
>>   * org.apache.tika.parser.**CompositeParser.parse(**
>> CompositeParser.java:242)
>>   * org.apache.tika.parser.**AutoDetectParser.parse(**
>> AutoDetectParser.java:120)
>>   * org.apache.solr.handler.**extraction.**ExtractingDocumentLoader.load(
>> **ExtractingDocumentLoader.java:**219)
>>   * org.apache.solr.handler.**ContentStreamHandlerBase.**
>> handleRequestBody(**ContentStreamHandlerBase.java:**74)
>>   * org.apache.solr.handler.**RequestHandlerBase.**handleRequest(**
>> RequestHandlerBase.java:135)
>>   * org.apache.solr.core.**RequestHandlers$**LazyRequestHandlerWrapper.**
>> handleRequest(RequestHandlers.**java:241)
>>   * org.apache.solr.core.SolrCore.**execute(SolrCore.java:1904)
>>   * org.apache.solr.servlet.**SolrDispatchFilter.execute(**
>> SolrDispatchFilter.java:659)
>>   * org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
>> SolrDispatchFilter.java:362)
>>   * org.apache.solr.servlet.**SolrDispatchFilter.doFilter(**
>> SolrDispatchFilter.java:158)
>>   * org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter(
>> **ApplicationFilterChain.java:**235)
>>   * org.apache.catalina.core.**ApplicationFilterChain.**doFilter(**
>> ApplicationFilterChain.java:**206)
>>   * org.apache.catalina.core.**StandardWrapperValve.invoke(**
>> StandardWrapperValve.java:233)
>>   * org.apache.catalina.core.**StandardContextValve.invoke(**
>> StandardContextValve.java:191)
>>   * org.apache.catalina.core.**StandardHostValve.invoke(**
>> StandardHostValve.java:127)
>>   * org.apache.catalina.valves.**ErrorReportValve.invoke(**
>> ErrorReportValve.java:102)
>>   * org.apache.catalina.core.**StandardEngineValve.invoke(**
>> StandardEngineValve.java:109)
>>   * org.apache.catalina.connector.**CoyoteAdapter.service(**
>> CoyoteAdapter.java:298)
>>   * org.apache.coyote.http11.**Http11Processor.process(**
>> Http11Processor.java:857)
>>   * org.apache.coyote.http11.**Http11Protocol$**Http11ConnectionHandler.*
>> *process(Http11Protocol.java:**588)
>>   * org.apache.tomcat.util.net.**JIoEndpoint$Worker.run(**
>> JIoEndpoint.java:489)
>>   * java.lang.Thread.run(Thread.**java:679)
>>
>>
>> "
>>
>> does anyone know what can I do? how to debug this issue? I don't see
>> anything in the log files and I am stuck
>> Thanks,
>> Yossi
>>
>>
>>
>>
>

Reply via email to