I would second what Erlend said. If you nevertheless want to index mp3's, I'd bring this up on the Solr or Tika boards.
Karl On Mon, Sep 16, 2013 at 5:15 AM, Erlend Garåsen <[email protected]>wrote: > > It seems that Tika is involved and tries to parse large files, i.e. MP3s. > > Do you really need to index such files? If not, try to filter them out by > adding a rule in the "exclude from crawl" field for the configured job. > > Erlend > > > On 9/16/13 7:13 AM, Yossi Nachum wrote: > >> Hi, >> >> I am trying to index my windows pc files with manifoldcf version 1.3 and >> solr version 4.4. >> >> I create output connection and repository connection and started a new >> job that scan my E drive. >> >> Everything seems like it work ok but after a few minutes solr stop >> getting new, I am seeing that through tomcat log file. >> >> On manifold crawler ui I see that the job is still running but after few >> minutes I am getting the following error: >> "Error: Repeated service interruptions - failure processing document: >> Server at >> http://localhost:8080/solr/**collection1<http://localhost:8080/solr/collection1>returned >> non ok >> status:500, message:Internal Server Error" >> >> I am seeing that tomcat process is constantly consume 100% of one cpu (I >> have two cpu's) even after I get the error message from manifolfcf >> crawler ui. >> >> I check the thread dump in solr admin and saw that the following threads >> take the most cpu/user time >> " >> http-8080-3 (32) >> >> * java.io.FileInputStream.**readBytes(Native Method) >> * java.io.FileInputStream.read(**FileInputStream.java:236) >> * java.io.BufferedInputStream.**fill(BufferedInputStream.java:**235) >> * java.io.BufferedInputStream.**read1(BufferedInputStream.**java:275) >> * java.io.BufferedInputStream.**read(BufferedInputStream.java:**334) >> * org.apache.tika.io.**ProxyInputStream.read(** >> ProxyInputStream.java:99) >> * java.io.FilterInputStream.**read(FilterInputStream.java:**133) >> * org.apache.tika.io.TailStream.**read(TailStream.java:117) >> * org.apache.tika.io.TailStream.**skip(TailStream.java:140) >> * org.apache.tika.parser.mp3.**MpegStream.skipStream(** >> MpegStream.java:283) >> * org.apache.tika.parser.mp3.**MpegStream.skipFrame(** >> MpegStream.java:160) >> * org.apache.tika.parser.mp3.**Mp3Parser.getAllTagHandlers(** >> Mp3Parser.java:193) >> * org.apache.tika.parser.mp3.**Mp3Parser.parse(Mp3Parser.**java:71) >> * org.apache.tika.parser.**CompositeParser.parse(** >> CompositeParser.java:242) >> * org.apache.tika.parser.**CompositeParser.parse(** >> CompositeParser.java:242) >> * org.apache.tika.parser.**AutoDetectParser.parse(** >> AutoDetectParser.java:120) >> * org.apache.solr.handler.**extraction.**ExtractingDocumentLoader.load( >> **ExtractingDocumentLoader.java:**219) >> * org.apache.solr.handler.**ContentStreamHandlerBase.** >> handleRequestBody(**ContentStreamHandlerBase.java:**74) >> * org.apache.solr.handler.**RequestHandlerBase.**handleRequest(** >> RequestHandlerBase.java:135) >> * org.apache.solr.core.**RequestHandlers$**LazyRequestHandlerWrapper.** >> handleRequest(RequestHandlers.**java:241) >> * org.apache.solr.core.SolrCore.**execute(SolrCore.java:1904) >> * org.apache.solr.servlet.**SolrDispatchFilter.execute(** >> SolrDispatchFilter.java:659) >> * org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** >> SolrDispatchFilter.java:362) >> * org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** >> SolrDispatchFilter.java:158) >> * org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter( >> **ApplicationFilterChain.java:**235) >> * org.apache.catalina.core.**ApplicationFilterChain.**doFilter(** >> ApplicationFilterChain.java:**206) >> * org.apache.catalina.core.**StandardWrapperValve.invoke(** >> StandardWrapperValve.java:233) >> * org.apache.catalina.core.**StandardContextValve.invoke(** >> StandardContextValve.java:191) >> * org.apache.catalina.core.**StandardHostValve.invoke(** >> StandardHostValve.java:127) >> * org.apache.catalina.valves.**ErrorReportValve.invoke(** >> ErrorReportValve.java:102) >> * org.apache.catalina.core.**StandardEngineValve.invoke(** >> StandardEngineValve.java:109) >> * org.apache.catalina.connector.**CoyoteAdapter.service(** >> CoyoteAdapter.java:298) >> * org.apache.coyote.http11.**Http11Processor.process(** >> Http11Processor.java:857) >> * org.apache.coyote.http11.**Http11Protocol$**Http11ConnectionHandler.* >> *process(Http11Protocol.java:**588) >> * org.apache.tomcat.util.net.**JIoEndpoint$Worker.run(** >> JIoEndpoint.java:489) >> * java.lang.Thread.run(Thread.**java:679) >> >> >> " >> >> does anyone know what can I do? how to debug this issue? I don't see >> anything in the log files and I am stuck >> Thanks, >> Yossi >> >> >> >> >
