Hey Emyr,

Looking at your stack trace below my guess is that you have two conflicting 
Apache POI jars in your classpath. The odd stack trace is indicative of that as 
the class loader is likely loading some other version of  the DirectoryNode 
class that doesn't have the iterator method. 

> java.lang.NoSuchMethodError: 
> org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;

Thanks,
Paul Ramirez


On May 5, 2011, at 6:36 AM, Emyr James wrote:

> Hi All,
> 
> I have solr and tika installed and am happily extracting and indexing 
> various files.
> Unfortunately on some word documents it blows up since it tries to 
> auto-generate a 'title' field but my title field in the schema is single 
> valued.
> 
> Here is my config for the extract handler...
> 
> <requestHandler name="/update/extract" 
> class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
> <lst name="defaults">
> <str name="uprefix">ignored_</str>
> </lst>
> </requestHandler>
> 
> Is there a config option to make it only extract text, or ideally to 
> allow me to specify which metadata fields to accept ?
> 
> E.g. I'd like to use any author metadata it finds but to not use any 
> title metadata it finds as I want title to be single valued and set 
> explicitly using a literal.title in the post request.
> 
> I did look around for some docs but all i can find are very basic 
> examples. there's no comprehensive configuration documentation out there 
> as far as I can tell.
> 
> 
> ALSO...
> 
> I get some other bad responses coming back such as...
> 
> <html><head><title>Apache Tomcat/6.0.28 - Error 
> report</title><style><!--H1 
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;}
>  
> H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#
> 525D76;font-size:16px;} H3 
> {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;}
>  
> BODY 
> {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B 
> {font-family:Tahoma,Arial,sans-serif;c
> olor:white;background-color:#525D76;} P 
> {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A
>  
> {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> 
> </head><body><h1>HTTP Status 500 - org.ap
> ache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
> 
> java.lang.NoSuchMethodError: 
> org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:168)
>     at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
>     at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
>     at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
>     at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:148)
>     at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:190)
>     at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>     at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>     at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>     at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
>     at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>     at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>     at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>     at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>     at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>     at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>     at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>     at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>     at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>     at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
>     at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
>     at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
>     at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
>     at java.lang.Thread.run(Thread.java:636)
> </h1><HR size="1" noshade="noshade"><p><b>type</b> Status 
> report</p><p><b>message</b> 
> <u>org.apache.poi.poifs.filesystem.DirectoryNode.iterator()Ljava/util/Iterator;
> 
> For the above my url was...
> 
>  
> http://localhost:8080/solr/update/extract?literal.id=3922&defaultField=content&fmap.content=content&uprefix=ignored_&stream.contentType=application%2Fvnd.ms-powerpoint&commit=true&literal.title=Reactor+cycle+141&literal.not
> es=&literal.tag=UCN_production&literal.author=Maurits+van+der+Grinten
> 
> I guess there's something special I need to be able to process power 
> point files ? Maybe I need to get the latest apache POI ? Any 
> suggestions welcome...
> 
> 
> Regards,
> 
> Emyr

Reply via email to