i removed the FieldReaderDataSource and dataSource="fld" but it didn't help. i get the following for each document: DataImportHandlerException: Exception in invoking url null Processing Document # 9 nullpointerexception
On 26. Sep 2013, at 8:39 PM, P Williams wrote: > Hi, > > Haven't tried this myself but maybe try leaving out the > FieldReaderDataSource entirely. From my quick searching looks like it's > tied to SQL. Did you try copying the > http://wiki.apache.org/solr/TikaEntityProcessor Advanced Parsing example > exactly? What happens when you leave out FieldReaderDataSource? > > Cheers, > Tricia > > > On Thu, Sep 26, 2013 at 4:17 AM, Andreas Owen <a...@conx.ch> wrote: > >> i'm using solr 4.3.1 and the dataimporter. i am trying to use >> XPathEntityProcessor within the TikaEntityProcessor for indexing html-pages >> but i'm getting this error for each document. i have also tried >> dataField="tika.text" and dataField="text" to no avail. the nested >> XPathEntityProcessor "detail" creates the error, the rest works fine. what >> am i doing wrong? >> >> error: >> >> ERROR - 2013-09-26 12:08:49.006; >> org.apache.solr.handler.dataimport.SqlEntityProcessor; The query failed >> 'null' >> java.lang.ClassCastException: java.io.StringReader cannot be cast to >> java.util.Iterator >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) >> at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) >> at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) >> at >> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) >> at >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) >> at >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) >> at >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) >> at >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) >> at >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) >> at >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) >> at >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) >> at >> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) >> at >> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) >> at >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) >> at >> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) >> at >> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) >> at >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) >> at org.eclipse.jetty.server.Server.handle(Server.java:365) >> at >> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) >> at >> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) >> at >> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) >> at >> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) >> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) >> at >> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) >> at >> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) >> at >> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) >> at >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) >> at >> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) >> at java.lang.Thread.run(Unknown Source) >> ERROR - 2013-09-26 12:08:49.022; org.apache.solr.common.SolrException; >> Exception in entity : >> detail:org.apache.solr.handler.dataimport.DataImportHandlerException: >> java.lang.ClassCastException: java.io.StringReader cannot be cast to >> java.util.Iterator >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:65) >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) >> at >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) >> at >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) >> at >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) >> at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) >> at >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) >> at >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) >> at >> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) >> at >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) >> at >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) >> at >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) >> at >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) >> at >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) >> at >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) >> at >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) >> at >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) >> at >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) >> at >> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) >> at >> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) >> at >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) >> at >> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) >> at >> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) >> at >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) >> at org.eclipse.jetty.server.Server.handle(Server.java:365) >> at >> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) >> at >> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) >> at >> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) >> at >> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) >> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) >> at >> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) >> at >> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) >> at >> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) >> at >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) >> at >> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) >> at java.lang.Thread.run(Unknown Source) >> Caused by: java.lang.ClassCastException: java.io.StringReader cannot be >> cast to java.util.Iterator >> at >> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >> ... 41 more >> >> >> >> data-config.xml >> >> <dataConfig> >> <dataSource type="BinURLDataSource" name="dataFile"/> >> <dataSource type="BinURLDataSource" name="dataUrl"/> >> <dataSource type="URLDataSource" name="main"/> >> <dataSource type="FieldReaderDataSource" name="fld"/> >> <document> >> <entity name="rec" processor="XPathEntityProcessor" >> url="file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml" >> forEach="/docs/doc" dataSource="main"> >> <field column="title" xpath="//title" /> >> <field column="id" xpath="//id" /> >> <field column="file" xpath="//file" /> >> <field column="url" xpath="//url" /> >> <field column="urlParse" xpath="//urlParse" /> >> <field column="last_modified" xpath="//last_modified" /> >> <field column="Author" xpath="//author" /> >> >> <entity name="tika" processor="TikaEntityProcessor" >> url="${rec.urlParse}" dataSource="dataUrl" onError="skip" format="html"> >> <field column="text"/> >> >> <entity name="detail" type="XPathEntityProcessor" >> forEach="/html" dataSource="fld" dataField="${tika.text}" rootEntity="true" >> onError="skip"> >> <field xpath="//h1" column="h_1" /> >> </entity> >> </entity> >> </entity> >> </document> >> </dataConfig>