Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to 
index, and it's blowing up on some Word docs:

  curl "http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true"; 
-F "myfile=@15.doc"

Here's the exception. And the same files go through Solr 3.6.1 just fine.

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
    <lst name="responseHeader"><int name="status">500</int><int 
name="QTime">18</int
    ></lst><lst name="error"><str 
name="msg">org.apache.tika.exception.TikaException
    : Unexpected RuntimeException from 
org.apache.tika.parser.microsoft.OfficeParser
    @328c62ce</str><str name="trace">org.apache.solr.common.SolrException: 
org.apach
    e.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika
    .parser.microsoft.OfficeParser@328c62ce
            at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
    actingDocumentLoader.java:230)
            at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
    ntentStreamHandlerBase.java:74)
            at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
    erBase.java:129)
            at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
    Request(RequestHandlers.java:240)
            at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
            at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
    .java:454)
            at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
    r.java:275)
            at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
    Handler.java:1337)
            at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
    :484)
            at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
    ava:119)
            at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
            at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
    er.java:233)
            at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
    er.java:1065)
            at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
    413)
            at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
    r.java:192)
            at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
    r.java:999)
            at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
    ava:117)
            at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
    extHandlerCollection.java:250)
            at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
    ection.java:149)
            at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
    .java:111)
            at org.eclipse.jetty.server.Server.handle(Server.java:351)
            at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac
    tHttpConnection.java:454)
            at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin
    gHttpConnection.java:47)
            at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra
    ctHttpConnection.java:890)
            at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header
    Complete(AbstractHttpConnection.java:944)
            at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642)
            at 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)

            at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo
    nnection.java:66)
            at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So
    cketConnector.java:254)
            at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
    l.java:599)
            at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool
    .java:534)
            at java.lang.Thread.run(Unknown Source)
    Caused by: org.apache.tika.exception.TikaException: Unexpected 
RuntimeException
    from org.apache.tika.parser.microsoft.OfficeParser@328c62ce
            at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
    )
            at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
    )
            at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
    20)
            at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
    actingDocumentLoader.java:224)
            ... 31 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
            at org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:163)
            at org.apache.poi.hwpf.model.Colorref.&lt;init&gt;(Colorref.java:81)
            at 
org.apache.poi.hwpf.model.types.SHDAbstractType.fillFields(SHDAbstrac
    tType.java:56)
            at 
org.apache.poi.hwpf.usermodel.ShadingDescriptor.&lt;init&gt;(ShadingD
    escriptor.java:38)
            at 
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.unCompressCHPOpera
    tion(CharacterSprmUncompressor.java:582)
            at 
org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(Char
    acterSprmUncompressor.java:65)
            at 
org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288)
            at 
org.apache.poi.hwpf.model.StyleSheet.&lt;init&gt;(StyleSheet.java:121
    )
            at 
org.apache.poi.hwpf.HWPFDocument.&lt;init&gt;(HWPFDocument.java:346)
            at 
org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.ja
    va:77)
            at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
    :185)
            at 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
    :160)
            at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
    )
            ... 34 more
    </str><int name="code">500</int></lst>
    </response>

Sincerely,
Alex 

Reply via email to