RE: Name finder questions

Robert Logue Sun, 17 Apr 2016 12:11:20 -0700

I am slightly confused what I can use the data in those links for? So can I use 
this data with the training tool like the following


opennlp TokenNameFinderTrainer -model OUTPUT_FILE_NAME -lang en 
-data DOWNLOADED_FILE_NAME -encoding UTF-8
And that should give me a better model file for when I use the name finder?

Thanks,

Robert

> From: [email protected]
> Date: Fri, 15 Apr 2016 17:12:20 +0200
> Subject: Re: Name finder questions
> To: [email protected]
> 
> Hi Robert,
> 
> On Fri, Apr 15, 2016 at 10:25 AM, Robert Logue <[email protected]> wrote:
> > Hello,
> >
> > I have just started using OpenNLP in the java application. I am just 
> > getting my used with the software and have a couple of newbie questions.
> >
> > I see for the name finder there is different model data for people and 
> > organizations (en-ner-organization.bin and en-ner-person.bin). Is there any 
> > way to combine these into one file so I can do 1 search that will give me 
> > back person names and organization names. Or is this not possible and is it 
> > best to do two searches?
> 
> This used to be experimental. It is not anymore, namely, you can train
> a name finder model for more than one entity type. The models
> available were trained with rather old newswire data so I would
> recommend you to obtain train new models using OpenNLP:
> 
> http://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.tool
> 
> I suppose you do not have manually annotated training data so I could
> recommend to get the Ontonotes corpus.
> 
> https://catalog.ldc.upenn.edu/LDC2013T19
> 
> https://github.com/ontonotes/conll-formatted-ontonotes-5.0
> 
> Another option is to get a silver standard corpus obtained
> automatically from the Wikipedia:
> 
> http://schwa.org/projects/resources/wiki/Wikiner#Automatic-training-data-from-Wikipedia
> 
> For Dutch, Spanish, German and Italian (that I know of) there are free
> resources. Search for Ancora, SONAR-1, GermEval 2014 and Evalita 2009.
> 
> > This question isn't related to the name finder and I don't think it is 
> > possible but thought I would ask anyway. If I had two sentences say 'Jack 
> > climbed the hill. He was very tired.' Is there any way to know that the 
> > pronoun, he, at the start of the second sentence is actually about Jack the 
> > subject of the first sentence? I know in this simple case it is obvious but 
> > I am wondering if there is anything in the OpenNLP software that will help 
> > with this?
> 
> The example you mentioned is called "pronominal anaphora" and it
> generalizes in the coreference resolution problem. There used to be a
> coreference tool in OpenNLP but got moved to the Sandbox because many
> things need to be updated to be able to distribute it.
> 
> See http://conll.cemantix.org/2012/introduction.html for more details.
> 
> HTH,
> 
> R

RE: Name finder questions

Reply via email to