i removed the FieldReaderDataSource and dataSource="fld" but it didn't help. i 
get the following for each document:
        DataImportHandlerException: Exception in invoking url null Processing 
Document # 9
        nullpointerexception


On 26. Sep 2013, at 8:39 PM, P Williams wrote:

> Hi,
> 
> Haven't tried this myself but maybe try leaving out the
> FieldReaderDataSource entirely.  From my quick searching looks like it's
> tied to SQL.  Did you try copying the
> http://wiki.apache.org/solr/TikaEntityProcessor Advanced Parsing example
> exactly?  What happens when you leave out FieldReaderDataSource?
> 
> Cheers,
> Tricia
> 
> 
> On Thu, Sep 26, 2013 at 4:17 AM, Andreas Owen <a...@conx.ch> wrote:
> 
>> i'm using solr 4.3.1 and the dataimporter. i am trying to use
>> XPathEntityProcessor within the TikaEntityProcessor for indexing html-pages
>> but i'm getting this error for each document. i have also tried
>> dataField="tika.text" and dataField="text" to no avail. the nested
>> XPathEntityProcessor "detail" creates the error, the rest works fine. what
>> am i doing wrong?
>> 
>> error:
>> 
>> ERROR - 2013-09-26 12:08:49.006;
>> org.apache.solr.handler.dataimport.SqlEntityProcessor; The query failed
>> 'null'
>> java.lang.ClassCastException: java.io.StringReader cannot be cast to
>> java.util.Iterator
>>        at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
>>        at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>>        at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
>>        at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
>>        at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
>>        at
>> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179)
>>        at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
>>        at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>>        at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>>        at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>>        at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>>        at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>        at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
>>        at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
>>        at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>>        at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
>>        at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>>        at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>>        at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>>        at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>>        at org.eclipse.jetty.server.Server.handle(Server.java:365)
>>        at
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
>>        at
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>>        at
>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
>>        at
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
>>        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
>>        at
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>>        at
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>>        at
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>>        at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>>        at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>>        at java.lang.Thread.run(Unknown Source)
>> ERROR - 2013-09-26 12:08:49.022; org.apache.solr.common.SolrException;
>> Exception in entity :
>> detail:org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.StringReader cannot be cast to
>> java.util.Iterator
>>        at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:65)
>>        at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
>>        at
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319)
>>        at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227)
>>        at
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422)
>>        at
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487)
>>        at
>> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179)
>>        at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
>>        at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
>>        at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
>>        at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
>>        at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
>>        at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
>>        at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
>>        at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
>>        at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
>>        at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
>>        at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
>>        at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>>        at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
>>        at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
>>        at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
>>        at org.eclipse.jetty.server.Server.handle(Server.java:365)
>>        at
>> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
>>        at
>> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
>>        at
>> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
>>        at
>> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
>>        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
>>        at
>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
>>        at
>> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
>>        at
>> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
>>        at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
>>        at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
>>        at java.lang.Thread.run(Unknown Source)
>> Caused by: java.lang.ClassCastException: java.io.StringReader cannot be
>> cast to java.util.Iterator
>>        at
>> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
>>        ... 41 more
>> 
>> 
>> 
>> data-config.xml
>> 
>> <dataConfig>
>>        <dataSource type="BinURLDataSource" name="dataFile"/>
>>        <dataSource type="BinURLDataSource" name="dataUrl"/>
>>        <dataSource type="URLDataSource" name="main"/>
>>        <dataSource type="FieldReaderDataSource" name="fld"/>
>> <document>
>> <entity name="rec" processor="XPathEntityProcessor"
>> url="file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml"
>> forEach="/docs/doc" dataSource="main">
>>                <field column="title" xpath="//title" />
>>                <field column="id" xpath="//id" />
>>                <field column="file" xpath="//file" />
>>                <field column="url" xpath="//url" />
>>                <field column="urlParse" xpath="//urlParse" />
>>                <field column="last_modified" xpath="//last_modified" />
>>                <field column="Author" xpath="//author" />
>> 
>>                <entity name="tika" processor="TikaEntityProcessor"
>> url="${rec.urlParse}" dataSource="dataUrl" onError="skip" format="html">
>>                        <field column="text"/>
>> 
>>                        <entity name="detail" type="XPathEntityProcessor"
>> forEach="/html" dataSource="fld" dataField="${tika.text}" rootEntity="true"
>> onError="skip">
>>                                <field xpath="//h1" column="h_1" />
>>                        </entity>
>>                </entity>
>>        </entity>
>> </document>
>> </dataConfig>

Reply via email to