you can always commit them one at a time to the ExtractingRequestHandler http://wiki.apache.org/solr/ExtractingRequestHandler
Best, Erick On Tue, Sep 17, 2013 at 6:47 AM, Yossi Nachum <nachum...@gmail.com> wrote: > Hi, > > I am trying to index my windows pc files with manifoldcf version 1.3 and > solr version 4.4. > > Few minutes after I start the crawler job I see that tomcat process > constantly consume 100% of one cpu (I have two cpu's). > > I check the thread dump in solr admin and saw that the following threads > take the most cpu/user time > " > http-8080-3 (32) > > - java.io.FileInputStream.readBytes(Native Method) > - java.io.FileInputStream.read(FileInputStream.java:236) > - java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > - java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > - java.io.BufferedInputStream.read(BufferedInputStream.java:334) > - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) > - java.io.FilterInputStream.read(FilterInputStream.java:133) > - org.apache.tika.io.TailStream.read(TailStream.java:117) > - org.apache.tika.io.TailStream.skip(TailStream.java:140) > - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283) > - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160) > - > org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193) > - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) > - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > - > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > - > > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) > - > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > - > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > - > > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) > - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) > - > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) > - > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) > - > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) > - > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > - > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > - > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > - > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > - > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > - > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > - > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > - > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) > - > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) > - > > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) > - org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) > - java.lang.Thread.run(Thread.java:679) > > " > > how can I check which file cause tika to work so hard? > I don't see anything in the log files and I am stuck > Thanks, > Yossi