Re: Solr OpenNLP named entity extraction

Steve Rowe Mon, 09 Jul 2018 14:20:10 -0700

Hi Jerome,

See the ref guide[1] for a writeup of how to enable uploading files larger than 
1MB into ZooKeeper.


Local storage should also work - have you tried placing OpenNLP model files in 
${solr.solr.home}/lib/ ? - make sure you do the same on each node.

[1] 
https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit

--
Steve
www.lucidworks.com

> On Jul 9, 2018, at 12:50 AM, Jerome Yang <jey...@pivotal.io> wrote:
> 
> Hi guys,
> 
> In Solrcloud mode, where to put the OpenNLP models?
> Upload to zookeeper?
> As I test on solr 7.3.1, seems absolute path on local host is not working.
> And can not upload into zookeeper if the model size exceed 1M.
> 
> Regards,
> Jerome
> 
> On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe <sar...@gmail.com> wrote:
> 
>> Hi Alexey,
>> 
>> First, thanks for moving the conversation to the mailing list.  Discussion
>> of usage problems should take place here rather than in JIRA.
>> 
>> I locally set up Solr 7.3 similarly to you and was able to get things to
>> work.
>> 
>> Problems with your setup:
>> 
>> 1. Your update chain is missing the Log and Run update processors at the
>> end (I see these are missing from the example in the javadocs for the
>> OpenNLP NER update processor; I’ll fix that):
>> 
>>     <processor class="solr.LogUpdateProcessorFactory" />
>>     <processor class="solr.RunUpdateProcessorFactory" />
>> 
>>   The Log update processor isn’t strictly necessary, but, from <
>> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
>>> :
>> 
>>       Do not forget to add RunUpdateProcessorFactory at the end of any
>>       chains you define in solrconfig.xml. Otherwise update requests
>>       processed by that chain will not actually affect the indexed data.
>> 
>> 2. Your example document is missing an “id” field.
>> 
>> 3. For whatever reason, the pre-trained model "en-ner-person.bin" doesn’t
>> extract anything from text “This is Steve Jobs 2”.  It will extract “Steve
>> Jobs” from text “This is Steve Jobs in white” e.g. though.
>> 
>> 4. (Not a problem necessarily) You may want to use a multi-valued “string”
>> field for the “dest” field in your update chain, e.g. “people_str” (“*_str”
>> in the default configset is so configured).
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <alex1989s...@gmail.com>
>> wrote:
>>> 
>>> Hi once more I am trying to implement named entities extraction using
>> this
>>> manual
>>> 
>> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
>>> 
>>> I am modified solrconfig.xml like this:
>>> 
>>> <updateRequestProcessorChain name="multiple-extract">
>>>  <processor
>> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
>>>    <str name="modelFile">opennlp/en-ner-person.bin</str>
>>>    <str name="analyzerFieldType">text_opennlp</str>
>>>    <str name="source">description_en</str>
>>>    <str name="dest">content</str>
>>>  </processor>
>>> </updateRequestProcessorChain>
>>> 
>>> But when I was trying to add data using:
>>> 
>>> *request:*
>>> 
>>> POST
>>> 
>> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
>>> 
>>> <add><doc><field name="description_en">This is Steve Jobs 2
>>> </field><field name="content_pos">This is text 2</field><field
>>> name="content">This is text for content 2</field></doc></add>
>>> 
>>> *response*
>>> 
>>> <?xml version="1.0" encoding="UTF-8"?>
>>> <response>
>>>   <lst name="responseHeader">
>>>       <int name="status">0</int>
>>>       <int name="QTime">3</int>
>>>   </lst>
>>> </response>
>>> 
>>> But I don't see any data inserted to *content* field and in any other
>> field.
>>> 
>>> *If you need some additional data I can provide it.*
>>> 
>>> Can you help me? What have I done wrong?
>> 
>> 
> 
> -- 
> Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>

Re: Solr OpenNLP named entity extraction

Reply via email to