Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to index, and it's blowing up on some Word docs:
curl "http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true" -F "myfile=@15.doc" Here's the exception. And the same files go through Solr 3.6.1 just fine. <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">500</int><int name="QTime">18</int ></lst><lst name="error"><str name="msg">org.apache.tika.exception.TikaException : Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser @328c62ce</str><str name="trace">org.apache.solr.common.SolrException: org.apach e.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika .parser.microsoft.OfficeParser@328c62ce at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:230) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co ntentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl erBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle Request(RequestHandlers.java:240) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter .java:454) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte r.java:275) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet Handler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java :484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j ava:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl er.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl er.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java: 413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle r.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle r.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j ava:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont extHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl ection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper .java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac tHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin gHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra ctHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header Complete(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo nnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So cketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo l.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool .java:534) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@328c62ce at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244 ) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1 20) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr actingDocumentLoader.java:224) ... 31 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 at org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:163) at org.apache.poi.hwpf.model.Colorref.<init>(Colorref.java:81) at org.apache.poi.hwpf.model.types.SHDAbstractType.fillFields(SHDAbstrac tType.java:56) at org.apache.poi.hwpf.usermodel.ShadingDescriptor.<init>(ShadingD escriptor.java:38) at org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.unCompressCHPOpera tion(CharacterSprmUncompressor.java:582) at org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(Char acterSprmUncompressor.java:65) at org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288) at org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:121 ) at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:346) at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.ja va:77) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java :185) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java :160) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 ) ... 34 more </str><int name="code">500</int></lst> </response> Sincerely, Alex