Hi Jerome,

I was able to setup a configset to perform OpenNLP NER, loading the model files 
from local storage.

There is a trick though[1]: the model files must be located *in a jar* or *in a 
subdirectory* under ${solr.solr.home}/lib/ or under a directory specified via a 
solrconfig.xml <lib> directive.

I tested with the bin/solr cloud example, and put model files under the two 
solr home directories, at example/cloud/node1/solr/lib/opennlp/ and 
example/cloud/node1/solr/lib/opennlp/.  The “opennlp/“ subdirectory is 
required, though its name can be anything else you choose.

[1] As you noted, ZkSolrResourceLoader delegates to its parent classloader when 
it can’t find resources in a configset, and the parent classloader is set up to 
load from subdirectories and jar files under ${solr.solr.home}/lib/ or under a 
directory specified via a solrconfig.xml <lib> directive.  These directories 
themselves are not included in the set of directories from which resources are 
loaded; only their children are.

--
Steve
www.lucidworks.com

> On Jul 9, 2018, at 10:10 PM, Jerome Yang <jey...@pivotal.io> wrote:
> 
> Hi Steve,
> 
> Put models under " ${solr.solr.home}/lib/ " is not working.
> I check the "ZkSolrResourceLoader" seems it will first try to find modes in
> config set.
> If not find, then it uses class loader to load from resources.
> 
> Regards,
> Jerome
> 
> On Tue, Jul 10, 2018 at 9:58 AM Jerome Yang <jey...@pivotal.io> wrote:
> 
>> Thanks Steve!
>> 
>> 
>> On Tue, Jul 10, 2018 at 5:20 AM Steve Rowe <sar...@gmail.com> wrote:
>> 
>>> Hi Jerome,
>>> 
>>> See the ref guide[1] for a writeup of how to enable uploading files
>>> larger than 1MB into ZooKeeper.
>>> 
>>> Local storage should also work - have you tried placing OpenNLP model
>>> files in ${solr.solr.home}/lib/ ? - make sure you do the same on each node.
>>> 
>>> [1]
>>> https://lucene.apache.org/solr/guide/7_4/setting-up-an-external-zookeeper-ensemble.html#increasing-the-file-size-limit
>>> 
>>> --
>>> Steve
>>> www.lucidworks.com
>>> 
>>>> On Jul 9, 2018, at 12:50 AM, Jerome Yang <jey...@pivotal.io> wrote:
>>>> 
>>>> Hi guys,
>>>> 
>>>> In Solrcloud mode, where to put the OpenNLP models?
>>>> Upload to zookeeper?
>>>> As I test on solr 7.3.1, seems absolute path on local host is not
>>> working.
>>>> And can not upload into zookeeper if the model size exceed 1M.
>>>> 
>>>> Regards,
>>>> Jerome
>>>> 
>>>> On Wed, Apr 18, 2018 at 9:54 AM Steve Rowe <sar...@gmail.com> wrote:
>>>> 
>>>>> Hi Alexey,
>>>>> 
>>>>> First, thanks for moving the conversation to the mailing list.
>>> Discussion
>>>>> of usage problems should take place here rather than in JIRA.
>>>>> 
>>>>> I locally set up Solr 7.3 similarly to you and was able to get things
>>> to
>>>>> work.
>>>>> 
>>>>> Problems with your setup:
>>>>> 
>>>>> 1. Your update chain is missing the Log and Run update processors at
>>> the
>>>>> end (I see these are missing from the example in the javadocs for the
>>>>> OpenNLP NER update processor; I’ll fix that):
>>>>> 
>>>>>    <processor class="solr.LogUpdateProcessorFactory" />
>>>>>    <processor class="solr.RunUpdateProcessorFactory" />
>>>>> 
>>>>>  The Log update processor isn’t strictly necessary, but, from <
>>>>> 
>>> https://lucene.apache.org/solr/guide/7_3/update-request-processors.html#custom-update-request-processor-chain
>>>>>> :
>>>>> 
>>>>>      Do not forget to add RunUpdateProcessorFactory at the end of any
>>>>>      chains you define in solrconfig.xml. Otherwise update requests
>>>>>      processed by that chain will not actually affect the indexed
>>> data.
>>>>> 
>>>>> 2. Your example document is missing an “id” field.
>>>>> 
>>>>> 3. For whatever reason, the pre-trained model "en-ner-person.bin"
>>> doesn’t
>>>>> extract anything from text “This is Steve Jobs 2”.  It will extract
>>> “Steve
>>>>> Jobs” from text “This is Steve Jobs in white” e.g. though.
>>>>> 
>>>>> 4. (Not a problem necessarily) You may want to use a multi-valued
>>> “string”
>>>>> field for the “dest” field in your update chain, e.g. “people_str”
>>> (“*_str”
>>>>> in the default configset is so configured).
>>>>> 
>>>>> --
>>>>> Steve
>>>>> www.lucidworks.com
>>>>> 
>>>>>> On Apr 17, 2018, at 8:23 AM, Alexey Ponomarenko <
>>> alex1989s...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi once more I am trying to implement named entities extraction using
>>>>> this
>>>>>> manual
>>>>>> 
>>>>> 
>>> https://lucene.apache.org/solr/7_3_0//solr-analysis-extras/org/apache/solr/update/processor/OpenNLPExtractNamedEntitiesUpdateProcessorFactory.html
>>>>>> 
>>>>>> I am modified solrconfig.xml like this:
>>>>>> 
>>>>>> <updateRequestProcessorChain name="multiple-extract">
>>>>>> <processor
>>>>> class="solr.OpenNLPExtractNamedEntitiesUpdateProcessorFactory">
>>>>>>   <str name="modelFile">opennlp/en-ner-person.bin</str>
>>>>>>   <str name="analyzerFieldType">text_opennlp</str>
>>>>>>   <str name="source">description_en</str>
>>>>>>   <str name="dest">content</str>
>>>>>> </processor>
>>>>>> </updateRequestProcessorChain>
>>>>>> 
>>>>>> But when I was trying to add data using:
>>>>>> 
>>>>>> *request:*
>>>>>> 
>>>>>> POST
>>>>>> 
>>>>> 
>>> http://localhost:8983/solr/numberplate/update?version=2.2&wt=xml&update.chain=multiple-extract
>>>>>> 
>>>>>> <add><doc><field name="description_en">This is Steve Jobs 2
>>>>>> </field><field name="content_pos">This is text 2</field><field
>>>>>> name="content">This is text for content 2</field></doc></add>
>>>>>> 
>>>>>> *response*
>>>>>> 
>>>>>> <?xml version="1.0" encoding="UTF-8"?>
>>>>>> <response>
>>>>>>  <lst name="responseHeader">
>>>>>>      <int name="status">0</int>
>>>>>>      <int name="QTime">3</int>
>>>>>>  </lst>
>>>>>> </response>
>>>>>> 
>>>>>> But I don't see any data inserted to *content* field and in any other
>>>>> field.
>>>>>> 
>>>>>> *If you need some additional data I can provide it.*
>>>>>> 
>>>>>> Can you help me? What have I done wrong?
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
>>> 
>>> 
>> 
>> --
>> Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>
>> 
>> 
> 
> -- 
> Pivotal Greenplum | Pivotal Software, Inc. <https://pivotal.io/>

Reply via email to