I am trying to implement the NLP functionality within the Solr 
ExtractingRequestHandler  and  the Tika framework
I am using PDF documents to index and have been successful in extracting and 
indexing the content but have not been successful in engaging the NLP routines. 
  I have reached the point where I even trying to generate an exception just to 
validate my understanding of the interfaces.
I have included  parts of my solrconfig and tika.config . Also,  I am using the 
techproducts example and Solr 6.3.0


solrconfig.xml
--- NLP Models  en-ner-organization.bin etc

  <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" 
regex=".*\.bin" />

  <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" 
regex=".*\.jar" />
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" 
/>

solrconfig.xml

  <requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
                  <str name="uprefix">attr_</str>
                  <str name="tika.config">tika-config.xml</str>
      <!-- capture link hrefs but ignore div attributes -->
      <str name="captureAttr">true</str>
      <str name="fmap.a">links</str>
      <str name="fmap.div">ignored_</str>
    </lst>
  </requestHandler>


Tika-Config.xml

<?xml version="1.0" encoding="UTF-8"?>
<properties>
    <parsers>
        <parser class="org.apache.tika.parser.ner.NamedEntityParser">
            <mime>text/plain</mime>
            <mime>text/html</mime>
            <mime>application/xhtml+xml</mime>
            <mime>application/pdf</mime>
        </parser>
    </parsers>
</properties>

Reply via email to