i'm already using URLDataSource On 30. Sep 2013, at 5:41 PM, P Williams wrote:
> Hi Andreas, > > When using > XPathEntityProcessor<http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor>your > DataSource > must be of type DataSource<Reader>. You shouldn't be using > BinURLDataSource, it's giving you the cast exception. Use > URLDataSource<https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/URLDataSource.html> > or > FileDataSource<https://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/FileDataSource.html>instead. > > I don't think you need to specify namespaces, at least you didn't used to. > The other thing that I've noticed is that the anywhere xpath expression // > doesn't always work in DIH. You might have to be more specific. > > Cheers, > Tricia > > > > > > On Sun, Sep 29, 2013 at 9:47 AM, Andreas Owen <a...@conx.ch> wrote: > >> how dum can you get. obviously quite dum... i would have to analyze the >> html-pages with a nested instance like this: >> >> <entity name="rec" processor="XPathEntityProcessor" >> url="file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml" >> forEach="/docs/doc" dataSource="main"> >> >> <entity name="htm" processor="XPathEntityProcessor" >> url="${rec.urlParse}" forEach="/xhtml:html" dataSource="dataUrl"> >> <field column="text" xpath="//content" /> >> <field column="h_2" xpath="//body" /> >> <field column="text_nohtml" xpath="//text" /> >> <field column="h_1" xpath="//h:h1" /> >> </entity> >> </entity> >> >> but i'm pretty sure the foreach is wrong and the xpath expressions. in the >> moment i getting the following error: >> >> Caused by: java.lang.RuntimeException: >> org.apache.solr.handler.dataimport.DataImportHandlerException: >> java.lang.ClassCastException: >> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast >> to java.io.Reader >> >> >> >> >> >> On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote: >> >>> ok i see what your getting at but why doesn't the following work: >>> >>> <field xpath="//h:h1" column="h_1" /> >>> <field column="text" xpath="/xhtml:html/xhtml:body" /> >>> >>> i removed the tiki-processor. what am i missing, i haven't found >> anything in the wiki? >>> >>> >>> On 28. Sep 2013, at 12:28 AM, P Williams wrote: >>> >>>> I spent some more time thinking about this. Do you really need to use >> the >>>> TikaEntityProcessor? It doesn't offer anything new to the document you >> are >>>> building that couldn't be accomplished by the XPathEntityProcessor alone >>>> from what I can tell. >>>> >>>> I also tried to get the Advanced >>>> Parsing<http://wiki.apache.org/solr/TikaEntityProcessor>example to >>>> work without success. There are some obvious typos (<document> >>>> instead of </document>) and an odd order to the pieces (<dataSources> is >>>> enclosed by <document>). It also looks like >>>> FieldStreamDataSource< >> http://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.html >>> is >>>> the one that is meant to work in this context. If Koji is still around >>>> maybe he could offer some help? Otherwise this bit of erroneous >>>> instruction should probably be removed from the wiki. >>>> >>>> Cheers, >>>> Tricia >>>> >>>> $ svn diff >>>> Index: >>>> >> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java >>>> =================================================================== >>>> --- >>>> >> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java >>>> (revision 1526990) >>>> +++ >>>> >> solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java >>>> (working copy) >>>> @@ -99,13 +99,13 @@ >>>> runFullImport(getConfigHTML("identity")); >>>> assertQ(req("*:*"), testsHTMLIdentity); >>>> } >>>> - >>>> + >>>> private String getConfigHTML(String htmlMapper) { >>>> return >>>> "<dataConfig>" + >>>> " <dataSource type='BinFileDataSource'/>" + >>>> " <document>" + >>>> - " <entity name='Tika' format='xml' >>>> processor='TikaEntityProcessor' " + >>>> + " <entity name='Tika' format='html' >>>> processor='TikaEntityProcessor' " + >>>> " url='" + >>>> getFile("dihextras/structured.html").getAbsolutePath() + "' " + >>>> ((htmlMapper == null) ? "" : (" htmlMapper='" + htmlMapper + >>>> "'")) + ">" + >>>> " <field column='text'/>" + >>>> @@ -114,4 +114,36 @@ >>>> "</dataConfig>"; >>>> >>>> } >>>> + private String[] testsHTMLH1 = { >>>> + "//*[@numFound='1']" >>>> + , "//str[@name='h1'][contains(.,'H1 Header')]" >>>> + }; >>>> + >>>> + @Test >>>> + public void testTikaHTMLMapperSubEntity() throws Exception { >>>> + runFullImport(getConfigSubEntity("identity")); >>>> + assertQ(req("*:*"), testsHTMLH1); >>>> + } >>>> + >>>> + private String getConfigSubEntity(String htmlMapper) { >>>> + return >>>> + "<dataConfig>" + >>>> + "<dataSource type='BinFileDataSource' name='bin'/>" + >>>> + "<dataSource type='FieldStreamDataSource' name='fld'/>" + >>>> + "<document>" + >>>> + "<entity name='tika' processor='TikaEntityProcessor' url='" + >>>> getFile("dihextras/structured.html").getAbsolutePath() + "' >>>> dataSource='bin' format='html' rootEntity='false'>" + >>>> + "<!--Do appropriate mapping here meta=\"true\" means it is a >>>> metadata field -->" + >>>> + "<field column='Author' meta='true' name='author'/>" + >>>> + "<field column='title' meta='true' name='title'/>" + >>>> + "<!--'text' is an implicit field emited by TikaEntityProcessor >> . >>>> Map it appropriately-->" + >>>> + "<field name='text' column='text'/>" + >>>> + "<entity name='detail' type='XPathEntityProcessor' >> forEach='/html' >>>> dataSource='fld' dataField='tika.text' rootEntity='true' >" + >>>> + "<field xpath='//div' column='foo'/>" + >>>> + "<field xpath='//h1' column='h1' />" + >>>> + "</entity>" + >>>> + "</entity>" + >>>> + "</document>" + >>>> + "</dataConfig>"; >>>> + } >>>> + >>>> } >>>> Index: >>>> >> solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml >>>> =================================================================== >>>> --- >>>> >> solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml >>>> (revision 1526990) >>>> +++ >>>> >> solr/contrib/dataimporthandler-extras/src/test-files/dihextras/solr/collection1/conf/dataimport-schema-no-unique-key.xml >>>> (working copy) >>>> @@ -194,6 +194,8 @@ >>>> <field name="title" type="string" indexed="true" stored="true"/> >>>> <field name="author" type="string" indexed="true" stored="true" /> >>>> <field name="text" type="text" indexed="true" stored="true" /> >>>> + <field name="h1" type="text" indexed="true" stored="true" /> >>>> + <field name="foo" type="text" indexed="true" stored="true" /> >>>> >>>> </fields> >>>> <!-- field for the QueryParser to use when an explicit fieldname is >>>> absent --> >>>> >>>> >>>> I find the SqlEntityProcessor part particularly odd. That's the default >>>> right?: >>>> 2405 T12 C1 oashd.SqlEntityProcessor.initQuery ERROR The query failed >>>> 'null' java.lang.RuntimeException: unsupported type : class >> java.lang.String >>>> at >>>> >> org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:89) >>>> at >>>> >> org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:1) >>>> at >>>> >> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >>>> at >>>> >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) >>>> at >>>> >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) >>>> at >>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:469) >>>> at >>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:495) >>>> at >>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:408) >>>> at >>>> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:323) >>>> at >>>> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:231) >>>> at >>>> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411) >>>> at >>>> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:476) >>>> at >>>> >> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) >>>> at >>>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) >>>> at org.apache.solr.util.TestHarness.query(TestHarness.java:291) >>>> at >>>> >> org.apache.solr.handler.dataimport.AbstractDataImportHandlerTestCase.runFullImport(AbstractDataImportHandlerTestCase.java:96) >>>> at >>>> >> org.apache.solr.handler.dataimport.TestTikaEntityProcessor.testTikaHTMLMapperSubEntity(TestTikaEntityProcessor.java:124) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>>> at >>>> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>>> at java.lang.reflect.Method.invoke(Method.java:601) >>>> at >>>> >> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) >>>> at >>>> >> com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) >>>> at >>>> >> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) >>>> at >>>> >> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) >>>> at >>>> >> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) >>>> at >>>> >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) >>>> at >>>> >> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) >>>> at >>>> >> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) >>>> at >>>> >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) >>>> at >>>> >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >>>> at >>>> >> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) >>>> at >>>> >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) >>>> at >>>> >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >>>> at >>>> >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >>>> at >>>> >> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) >>>> at >>>> >> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) >>>> at >>>> >> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) >>>> at >>>> >> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) >>>> at >>>> >> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) >>>> at >>>> >> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) >>>> at >>>> >> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) >>>> at >>>> >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >>>> at >>>> >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) >>>> at >>>> >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) >>>> at >>>> >> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) >>>> at >>>> >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >>>> at >>>> >> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) >>>> at >>>> >> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) >>>> at >>>> >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >>>> at >>>> >> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) >>>> at >>>> >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >>>> at >>>> >> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) >>>> at >>>> >> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) >>>> at >>>> >> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) >>>> at >>>> >> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) >>>> at java.lang.Thread.run(Thread.java:722) >>>> >>>> >>>> >>>> On Fri, Sep 27, 2013 at 3:55 AM, Andreas Owen <a...@conx.ch> wrote: >>>> >>>>> i removed the FieldReaderDataSource and dataSource="fld" but it didn't >>>>> help. i get the following for each document: >>>>> DataImportHandlerException: Exception in invoking url null >>>>> Processing Document # 9 >>>>> nullpointerexception >>>>> >>>>> >>>>> On 26. Sep 2013, at 8:39 PM, P Williams wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> Haven't tried this myself but maybe try leaving out the >>>>>> FieldReaderDataSource entirely. From my quick searching looks like >> it's >>>>>> tied to SQL. Did you try copying the >>>>>> http://wiki.apache.org/solr/TikaEntityProcessor Advanced Parsing >> example >>>>>> exactly? What happens when you leave out FieldReaderDataSource? >>>>>> >>>>>> Cheers, >>>>>> Tricia >>>>>> >>>>>> >>>>>> On Thu, Sep 26, 2013 at 4:17 AM, Andreas Owen <a...@conx.ch> wrote: >>>>>> >>>>>>> i'm using solr 4.3.1 and the dataimporter. i am trying to use >>>>>>> XPathEntityProcessor within the TikaEntityProcessor for indexing >>>>> html-pages >>>>>>> but i'm getting this error for each document. i have also tried >>>>>>> dataField="tika.text" and dataField="text" to no avail. the nested >>>>>>> XPathEntityProcessor "detail" creates the error, the rest works fine. >>>>> what >>>>>>> am i doing wrong? >>>>>>> >>>>>>> error: >>>>>>> >>>>>>> ERROR - 2013-09-26 12:08:49.006; >>>>>>> org.apache.solr.handler.dataimport.SqlEntityProcessor; The query >> failed >>>>>>> 'null' >>>>>>> java.lang.ClassCastException: java.io.StringReader cannot be cast to >>>>>>> java.util.Iterator >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >>>>>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) >>>>>>> at org.eclipse.jetty.server.Server.handle(Server.java:365) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) >>>>>>> at >>>>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) >>>>>>> at >>>>>>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) >>>>>>> at java.lang.Thread.run(Unknown Source) >>>>>>> ERROR - 2013-09-26 12:08:49.022; >> org.apache.solr.common.SolrException; >>>>>>> Exception in entity : >>>>>>> detail:org.apache.solr.handler.dataimport.DataImportHandlerException: >>>>>>> java.lang.ClassCastException: java.io.StringReader cannot be cast to >>>>>>> java.util.Iterator >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:65) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) >>>>>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) >>>>>>> at org.eclipse.jetty.server.Server.handle(Server.java:365) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) >>>>>>> at >>>>> org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) >>>>>>> at >>>>>>> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) >>>>>>> at >>>>>>> >>>>> >> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) >>>>>>> at java.lang.Thread.run(Unknown Source) >>>>>>> Caused by: java.lang.ClassCastException: java.io.StringReader cannot >> be >>>>>>> cast to java.util.Iterator >>>>>>> at >>>>>>> >>>>> >> org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59) >>>>>>> ... 41 more >>>>>>> >>>>>>> >>>>>>> >>>>>>> data-config.xml >>>>>>> >>>>>>> <dataConfig> >>>>>>> <dataSource type="BinURLDataSource" name="dataFile"/> >>>>>>> <dataSource type="BinURLDataSource" name="dataUrl"/> >>>>>>> <dataSource type="URLDataSource" name="main"/> >>>>>>> <dataSource type="FieldReaderDataSource" name="fld"/> >>>>>>> <document> >>>>>>> <entity name="rec" processor="XPathEntityProcessor" >>>>>>> >>>>> >> url="file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml" >>>>>>> forEach="/docs/doc" dataSource="main"> >>>>>>> <field column="title" xpath="//title" /> >>>>>>> <field column="id" xpath="//id" /> >>>>>>> <field column="file" xpath="//file" /> >>>>>>> <field column="url" xpath="//url" /> >>>>>>> <field column="urlParse" xpath="//urlParse" /> >>>>>>> <field column="last_modified" xpath="//last_modified" /> >>>>>>> <field column="Author" xpath="//author" /> >>>>>>> >>>>>>> <entity name="tika" processor="TikaEntityProcessor" >>>>>>> url="${rec.urlParse}" dataSource="dataUrl" onError="skip" >> format="html"> >>>>>>> <field column="text"/> >>>>>>> >>>>>>> <entity name="detail" >> type="XPathEntityProcessor" >>>>>>> forEach="/html" dataSource="fld" dataField="${tika.text}" >>>>> rootEntity="true" >>>>>>> onError="skip"> >>>>>>> <field xpath="//h1" column="h_1" /> >>>>>>> </entity> >>>>>>> </entity> >>>>>>> </entity> >>>>>>> </document> >>>>>>> </dataConfig> >>>>> >>>>> >> >>