You say "dataSource='bin'" but I don't see you defining that datasource. E.g.:
<dataSource type="BinURLDataSource" name="bin"/> So, there might be some weird default fallback that's just causes strange problems. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 10 October 2014 14:17, Dan Davis <dansm...@gmail.com> wrote: > > What I want to do is to pull an URL out of an Oracle database, and then use > TikaEntityProcessor and BinURLDataSource to go fetch and process that URL. > I'm having a problem with this that seems general to JDBC with Tika - I get > an exception as follows: > > Exception in entity : > extract:org.apache.solr.handler.dataimport.DataImportHandlerException: > Unable to execute query: http://www.cdc.gov/healthypets/pets/wildlife.html > Processing Document # 14 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71) > ... > > Steps to reproduce any problem should be: > > Try it with the XML and verify you get two documents and they contain text > (schema browser with the text field) > Try it with a JDBC sqlite3 dataSource and verify that you get an exception, > and advise me what may be the problem in my configuration ... > > Now, I've tried this 3 ways: > > My Oracle database - fails as above > An SQLite3 database to see if it is Oracle specific - fails with "Unable to > execute query", but doesn't have the URL as part of the message. > An XML file listing two URLs - succeeds without error. > > For the SQL attempts, setting onError="skip" leads the data from the > database to be indexed, but the exception is logged for each root entity. > I can tell that nothing is indexed from the text extraction by browsing the > "text" field from the schema browser and seeing how few terms there are. > The exceptions also sort of give it away, but it is good to be careful :) > > This is using: > > Tomcat 7.0.55 > Solr 4.10.1 > and JDBC drivers > > ojdbc7.jar > sqlite-jdbc-3.7.2.jar > > Excerpt of solrconfig.xml: > > <!-- Data Import Handler for Health Topics --> > <requestHandler name="/dih-healthtopics" class="solr.DataImportHandler"> > <lst name="defaults"> > <str name="config">dih-healthtopics.xml</str> > </lst> > </requestHandler> > > <!-- Data Import Handler that imports a single URL via Tika --> > <requestHandler name="/dih-smallxml" class="solr.DataImportHandler"> > <lst name="defaults"> > <str name="config">dih-smallxml.xml</str> > </lst> > </requestHandler> > > <!-- Data Import Handler that imports a single URL via Tika --> > <requestHandler name="/dih-smallsqlite" class="solr.DataImportHandler"> > <lst name="defaults"> > <str name="config">dih-smallsqlite.xml</str> > </lst> > </requestHandler> > > > The data import handlers and a copy-paste from Solr logging are attached.