Hello,

I got a task to index in Solr 7.71 a PDF files which are stored in SqlBase
database. I did half the job - I can to index all table fields, I can do a
search in these fields except field in which is stored a pdf file content.
As I am ttotally new in Solr, spent unsuccessfully a lot a time trying to
understand how to force to extract and index field with pdf content. I need
a help.

Regards,

Aruna

in solrconfig.xml i have


* <lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib"
regex=".*\.jar" />  <lib dir="${solr.install.dir:../../../..}/dist/"
regex="solr-dataimporthandler-.*\.jar" /> *
*  <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib"
regex=".*\.jar" />*
*  <lib dir="${solr.install.dir:../../../..}/dist/"
regex="solr-cell-\d.*\.jar" />*









*<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >    <lst
name="defaults">      <str name="lowernames">true</str>      <str
name="fmap.meta">ignored_</str>      <str
name="fmap.content">_text_</str>    </lst>  </requestHandler>*





*<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">   <lst
name="defaults">    <str name="config">db-data-config.xml</str>   </lst>
</requestHandler>*



















*---------------------------------------------------------------------------------------------------------------------------------------------db-data-config.xml<dataConfig><dataSource
type="JdbcDataSource"
driver="jdbc.unify.sqlbase.SqlbaseDriver"
url="jdbc:sqlbase://localhost:2155/PDFDOCS"
user="sysadm"            password="sysadm" />   <document>  <entity
name="PDFDOCUMENTS" query="select ID, PDOCUMENT, UNIT from SYSADM.DOCS">
  <field column="ID" name="idx" />       <field column="PDOCUMENT"
name="PDF" />        <field column="UNIT" name="division" />    </entity>
</document></dataConfig>*

Reply via email to