Hi Alexey,

First, thanks for moving the conversation to the mailing list.  Discussion of 
usage problems should take place here rather than in JIRA.

I locally set up Solr 7.3 similarly to you and was able to get things to work.

Problems with your setup:

1. Your update chain is missing the Log and Run update processors at the end (I 
see these are missing from the example in the javadocs for the OpenNLP NER 
update processor; I’ll fix that):

     <processor class="solr.LogUpdateProcessorFactory" />
     <processor class="solr.RunUpdateProcessorFactory" />

   The Log update processor isn’t strictly necessary, but, from 
<https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain>:

       Do not forget to add RunUpdateProcessorFactory at the end of any
       chains you define in solrconfig.xml. Otherwise update requests
       processed by that chain will not actually affect the indexed data.

2. Your example document is missing an “id” field.

3. For whatever reason, the pre-trained model "en-ner-person.bin" doesn’t 
extract anything from text “This is Steve Jobs 2”.  It will extract “Steve 
Jobs” from text “This is Steve Jobs in white” e.g. though.

4. (Not a problem necessarily) You may want to use a multi-valued “string” 
field for the “dest” field in your update chain, e.g. “people_str” (“*_str” in 
the default configset is so configured).

--
Steve
www.lucidworks.com

> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <alex1989s...@gmail.com> 
> wrote:
> 
> Hi once more I am trying to implement named entities extraction using this
> manual
> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
> 
> I am modified solrconfig.xml like this:
> 
> <updateRequestProcessorChain name="multiple-extract">
>   <processor class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
>     <str name="modelFile">opennlp/en-ner-person.bin</str>
>     <str name="analyzerFieldType">text_opennlp</str>
>     <str name="source">description_en</str>
>     <str name="dest">content</str>
>   </processor>
> </updateRequestProcessorChain>
> 
> But when I was trying to add data using:
> 
> *request:*
> 
> POST
> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
> 
> <add><doc><field name="description_en">This is Steve Jobs 2
> </field><field name="content_pos">This is text 2</field><field
> name="content">This is text for content 2</field></doc></add>
> 
> *response*
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <response>
>    <lst name="responseHeader">
>        <int name="status">0</int>
>        <int name="QTime">3</int>
>    </lst>
> </response>
> 
> But I don't see any data inserted to *content* field and in any other field.
> 
> *If you need some additional data I can provide it.*
> 
> Can you help me? What have I done wrong?

Reply via email to