Thanks, Alexandre.    My role is to kick the tires on this.   We're trying
it a couple of different ways.   So, I'm going to assume this could be
resolved and move on to trying ManifestCF and see whether it can do similar
things for me, e.g. what it adds for free to our bag of tricks.

On Fri, Oct 10, 2014 at 3:16 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> I would concentrate on the stack traces and try reading them. They
> often provide a lot of clues. For example, you original stack trace
> had
>
>
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.<init>(JdbcDataSource.java:283)
> at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:240)
> 2) at
> org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:44)
> at
> org.apache.solr.handler.dataimport.DebugLogger$2.getData(DebugLogger.java:188)
> 1) at
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:112)
> at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
>
> I added 1) and 2) to show the lines of importance. You can see in 1)
> that your TikaEntityProcessor is calling 2) JdbcDataSource, which was
> not what you wanted as you specified BinDataSource. So, you focus on
> that until it gets resolved.
>
> Sometimes these happens when the XML file says 'datasource' instead of
> 'dataSource' (DIH is case-sensitive), but it does not seem to be the
> case in your situation.
>
> Regards,
>     Alex.
> P.s. If you still haven't figure it out, mention the Solr version on
> the next email. Sometimes it makes difference, though DIH has been
> largely unchanged for a while.
>
> ---------- Forwarded message ----------
> From: Dan Davis <d...@danizen.net>
> Date: 10 October 2014 15:00
> Subject: Re: Tika Integration problem with DIH and JDBC
> To: Alexandre Rafalovitch <arafa...@gmail.com>
>
>
> The definition of dataSource name="bin" type="BinURLDataSource" is in
> each of the dih-*.xml files.
> But only the xml version has the definition at the top, above the document.
>
> Moving the dataSource definition to the top does change the behavior,
> now I get the following error for that entity:
>
> Exception in entity :
> extract:org.apache.solr.handler.dataimport.DataImportHandlerException:
> JDBC URL or JNDI name has to be specified Processing Document # 30
>
> When I changed it to specify url="", it then reverted to form:
>
> Exception in entity :
> extract:org.apache.solr.handler.dataimport.DataImportHandlerException:
> Unable to execute query: http://www.cdc.gov/flu/swineflu/ Processing
> Document # 1
> at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
>
> It does seem to be a problem resolving the dataSource in some way.   I
> did double check another part of solrconfig.xml therefore.   Since the
> XML example still works, I guess I know it has to be there.
>
>   <lib dir="${solr.solr.home:}/dist/"
> regex="solr-dataimporthandler-.*\.jar" />
>
>   <lib dir="${solr.solr.home:}/contrib/extraction/lib" regex=".*\.jar" />
>   <lib dir="${solr.solr.home:}/dist/" regex="solr-cell-\d.*\.jar" />
>
>   <lib dir="${solr.solr.home:}/contrib/clustering/lib/" regex=".*\.jar" />
>   <lib dir="${solr.solr.home:}/dist/" regex="solr-clustering-\d.*\.jar" />
>
>   <lib dir="${solr.solr.home:}/contrib/langid/lib/" regex=".*\.jar" />
>   <lib dir="${solr.solr.home:}/dist/" regex="solr-langid-\d.*\.jar" />
>
>   <lib dir="${solr.solr.home:}/contrib/velocity/lib" regex=".*\.jar" />
>   <lib dir="${solr.solr.home:}/dist/" regex="solr-velocity-\d.*\.jar" />
>
>
> On Fri, Oct 10, 2014 at 2:37 PM, Alexandre Rafalovitch
> <arafa...@gmail.com> wrote:
> >
> > You say "dataSource='bin'" but I don't see you defining that datasource.
> E.g.:
> >
> > <dataSource type="BinURLDataSource" name="bin"/>
> >
> > So, there might be some weird default fallback that's just causes
> > strange problems.
> >
> > Regards,
> >     Alex.
> >
> > Personal: http://www.outerthoughts.com/ and @arafalov
> > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
> >
> >
> > On 10 October 2014 14:17, Dan Davis <dansm...@gmail.com> wrote:
> > >
> > > What I want to do is to pull an URL out of an Oracle database, and
> then use
> > > TikaEntityProcessor and BinURLDataSource to go fetch and process that
> URL.
> > > I'm having a problem with this that seems general to JDBC with Tika -
> I get
> > > an exception as follows:
> > >
> > > Exception in entity :
> > > extract:org.apache.solr.handler.dataimport.DataImportHandlerException:
> > > Unable to execute query:
> http://www.cdc.gov/healthypets/pets/wildlife.html
> > > Processing Document # 14
> > >       at
> > >
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
> > > ...
> > >
> > > Steps to reproduce any problem should be:
> > >
> > > Try it with the XML and verify you get two documents and they contain
> text
> > > (schema browser with the text field)
> > > Try it with a JDBC sqlite3 dataSource and verify that you get an
> exception,
> > > and advise me what may be the problem in my configuration ...
> > >
> > > Now, I've tried this 3 ways:
> > >
> > > My Oracle database - fails as above
> > > An SQLite3 database to see if it is Oracle specific - fails with
> "Unable to
> > > execute query", but doesn't have the URL as part of the message.
> > > An XML file listing two URLs - succeeds without error.
> > >
> > > For the SQL attempts, setting onError="skip" leads the data from the
> > > database to be indexed, but the exception is logged for each root
> entity.
> > > I can tell that nothing is indexed from the text extraction by
> browsing the
> > > "text" field from the schema browser and seeing how few terms there
> are.
> > > The exceptions also sort of give it away, but it is good to be careful
> :)
> > >
> > > This is using:
> > >
> > > Tomcat 7.0.55
> > > Solr 4.10.1
> > > and JDBC drivers
> > >
> > > ojdbc7.jar
> > > sqlite-jdbc-3.7.2.jar
> > >
> > > Excerpt of solrconfig.xml:
> > >
> > >   <!-- Data Import Handler for Health Topics -->
> > >   <requestHandler name="/dih-healthtopics"
> class="solr.DataImportHandler">
> > >     <lst name="defaults">
> > >       <str name="config">dih-healthtopics.xml</str>
> > >     </lst>
> > >   </requestHandler>
> > >
> > >   <!-- Data Import Handler that imports a single URL via Tika -->
> > >   <requestHandler name="/dih-smallxml" class="solr.DataImportHandler">
> > >     <lst name="defaults">
> > >       <str name="config">dih-smallxml.xml</str>
> > >     </lst>
> > >   </requestHandler>
> > >
> > >     <!-- Data Import Handler that imports a single URL via Tika -->
> > >   <requestHandler name="/dih-smallsqlite"
> class="solr.DataImportHandler">
> > >     <lst name="defaults">
> > >       <str name="config">dih-smallsqlite.xml</str>
> > >     </lst>
> > >   </requestHandler>
> > >
> > >
> > > The data import handlers and a copy-paste from Solr logging are
> attached.
>

Reply via email to