Re: Name finder questions

Rodrigo Agerri Fri, 15 Apr 2016 08:26:50 -0700

Hi Robert,

On Fri, Apr 15, 2016 at 10:25 AM, Robert Logue <[email protected]> wrote:
> Hello,
>
> I have just started using OpenNLP in the java application. I am just getting 
> my used with the software and have a couple of newbie questions.
>
> I see for the name finder there is different model data for people and 
> organizations (en-ner-organization.bin and en-ner-person.bin). Is there any 
> way to combine these into one file so I can do 1 search that will give me 
> back person names and organization names. Or is this not possible and is it 
> best to do two searches?


This used to be experimental. It is not anymore, namely, you can train
a name finder model for more than one entity type. The models
available were trained with rather old newswire data so I would
recommend you to obtain train new models using OpenNLP:

http://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.tool

I suppose you do not have manually annotated training data so I could
recommend to get the Ontonotes corpus.

https://catalog.ldc.upenn.edu/LDC2013T19

https://github.com/ontonotes/conll-formatted-ontonotes-5.0

Another option is to get a silver standard corpus obtained
automatically from the Wikipedia:

http://schwa.org/projects/resources/wiki/Wikiner#Automatic-training-data-from-Wikipedia

For Dutch, Spanish, German and Italian (that I know of) there are free
resources. Search for Ancora, SONAR-1, GermEval 2014 and Evalita 2009.

> This question isn't related to the name finder and I don't think it is 
> possible but thought I would ask anyway. If I had two sentences say 'Jack 
> climbed the hill. He was very tired.' Is there any way to know that the 
> pronoun, he, at the start of the second sentence is actually about Jack the 
> subject of the first sentence? I know in this simple case it is obvious but I 
> am wondering if there is anything in the OpenNLP software that will help with 
> this?

The example you mentioned is called "pronominal anaphora" and it
generalizes in the coreference resolution problem. There used to be a
coreference tool in OpenNLP but got moved to the Sandbox because many
things need to be updated to be able to distribute it.

See http://conll.cemantix.org/2012/introduction.html for more details.

HTH,

R

Re: Name finder questions

Reply via email to